date:20150617

Re: [PATCH 4/5] kvmtool: Save datamatch as little endian in {add,del}_event

2015-06-17 Thread Andreas Herrmann

On Tue, Jun 16, 2015 at 06:17:14PM +0100, Will Deacon wrote:
> On Mon, Jun 15, 2015 at 12:49:45PM +0100, Andreas Herrmann wrote:
> > W/o dedicated endianess it's impossible to find reliably a match
> > e.g. in kernel/virt/kvm/eventfd.c ioeventfd_in_range.
> 
> Hmm, but shouldn't this be the endianness of the guest, rather than just
> forcing things to little-endian?

With my patch and following adaption to
ioeventfd_in_range (in virt/kvm/eventfd.c):

switch (len) {
case 1:
_val = *(u8 *)val;
break;
case 2:
_val = le16_to_cpu(*(u16 *)val);
break;
case 4:
_val = le32_to_cpu(*(u32 *)val);
break;
case 8:
_val = le64_to_cpu(*(u64 *)val);
break;
default:
return false;
}

return _val == le64_to_cpu(p->datamatch) ? true : false;

datamatch is properly evaluated on either endianess.

The current code in ioeventfd_in_range looks fragile to me (for big
endian systems) and didn't work with kvmtool:

switch (len) {
case 1:
_val = *(u8 *)val;
break;
case 2:
_val = *(u16 *)val;
break;
case 4:
_val = *(u32 *)val;
break;
case 8:
_val = *(u64 *)val;
break;
default:
return false;
}

return _val == p->datamatch ? true : false;

But now I see, w/o a correponding kernel change the patch shouldn't
be merged.


Andreas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Igor Mammedov

On Wed, 17 Jun 2015 08:34:26 +0200
"Michael S. Tsirkin"  wrote:

> On Wed, Jun 17, 2015 at 12:00:56AM +0200, Igor Mammedov wrote:
> > On Tue, 16 Jun 2015 23:14:20 +0200
> > "Michael S. Tsirkin"  wrote:
> > 
> > > On Tue, Jun 16, 2015 at 06:33:37PM +0200, Igor Mammedov wrote:
> > > > since commit
> > > >  1d4e7e3 kvm: x86: increase user memory slots to 509
> > > > 
> > > > it became possible to use a bigger amount of memory
> > > > slots, which is used by memory hotplug for
> > > > registering hotplugged memory.
> > > > However QEMU crashes if it's used with more than ~60
> > > > pc-dimm devices and vhost-net since host kernel
> > > > in module vhost-net refuses to accept more than 65
> > > > memory regions.
> > > > 
> > > > Increase VHOST_MEMORY_MAX_NREGIONS from 65 to 509
> > > 
> > > It was 64, not 65.
> > > 
> > > > to match KVM_USER_MEM_SLOTS fixes issue for vhost-net.
> > > > 
> > > > Signed-off-by: Igor Mammedov 
> > > 
> > > Still thinking about this: can you reorder this to
> > > be the last patch in the series please?
> > sure
> > 
> > > 
> > > Also - 509?
> > userspace memory slots in terms of KVM, I made it match
> > KVM's allotment of memory slots for userspace side.
> 
> Maybe KVM has its reasons for this #. I don't see
> why we need to match this exactly.
np, I can cap it at safe 300 slots but it's unlikely that it
would take cut off 1 extra hop since it's capped by QEMU
at 256+[initial fragmented memory]

> 
> > > I think if we are changing this, it'd be nice to
> > > create a way for userspace to discover the support
> > > and the # of regions supported.
> > That was my first idea before extending KVM's memslots
> > to teach kernel to tell qemu this number so that QEMU
> > at least would be able to check if new memory slot could
> > be added but I was redirected to a more simple solution
> > of just extending vs everdoing things.
> > Currently QEMU supports upto ~250 memslots so 509
> > is about twice high we need it so it should work for near
> > future
> 
> Yes but old kernels are still around. Would be nice if you
> can detect them.
> 
> > but eventually we might still teach kernel and QEMU
> > to make things more robust.
> 
> A new ioctl would be easy to add, I think it's a good
> idea generally.
I can try to do something like this on top of this series.

> 
> > > 
> > > 
> > > > ---
> > > >  drivers/vhost/vhost.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > index 99931a0..6a18c92 100644
> > > > --- a/drivers/vhost/vhost.c
> > > > +++ b/drivers/vhost/vhost.c
> > > > @@ -30,7 +30,7 @@
> > > >  #include "vhost.h"
> > > >  
> > > >  enum {
> > > > -   VHOST_MEMORY_MAX_NREGIONS = 64,
> > > > +   VHOST_MEMORY_MAX_NREGIONS = 509,
> > > > VHOST_MEMORY_F_LOG = 0x1,
> > > >  };
> > > >  
> > > > -- 
> > > > 1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] vhost: support upto 509 memory regions

2015-06-17 Thread Igor Mammedov

On Wed, 17 Jun 2015 08:31:23 +0200
"Michael S. Tsirkin"  wrote:

> On Wed, Jun 17, 2015 at 12:19:15AM +0200, Igor Mammedov wrote:
> > On Tue, 16 Jun 2015 23:16:07 +0200
> > "Michael S. Tsirkin"  wrote:
> > 
> > > On Tue, Jun 16, 2015 at 06:33:34PM +0200, Igor Mammedov wrote:
> > > > Series extends vhost to support upto 509 memory regions,
> > > > and adds some vhost:translate_desc() performance improvemnts
> > > > so it won't regress when memslots are increased to 509.
> > > > 
> > > > It fixes running VM crashing during memory hotplug due
> > > > to vhost refusing accepting more than 64 memory regions.
> > > > 
> > > > It's only host kernel side fix to make it work with QEMU
> > > > versions that support memory hotplug. But I'll continue
> > > > to work on QEMU side solution to reduce amount of memory
> > > > regions to make things even better.
> > > 
> > > I'm concerned userspace work will be harder, in particular,
> > > performance gains will be harder to measure.
> > it appears so, so far.
> > 
> > > How about a flag to disable caching?
> > I've tried to measure cost of cache miss but without much luck,
> > difference between version with cache and with caching removed
> > was within margin of error (±10ns) (i.e. not mensurable on my
> > 5min/10*10^6 test workload).
> 
> Confused. I thought it was very much measureable.
> So why add a cache if you can't measure its effect?
I hasn't been able to measure immediate delta between function
start/end with precision more than 10ns, perhaps used method
(system tap) is to blame.
But it's still possible to measure indirectly like 2% from 5/5.

> 
> > Also I'm concerned about adding extra fetch+branch for flag
> > checking will make things worse for likely path of cache hit,
> > so I'd avoid it if possible.
> > 
> > Or do you mean a simple global per module flag to disable it and
> > wrap thing in static key so that it will be cheap jump to skip
> > cache?
> 
> Something like this, yes.
ok, will do.

> 
> > > > Performance wise for guest with (in my case 3 memory regions)
> > > > and netperf's UDP_RR workload translate_desc() execution
> > > > time from total workload takes:
> > > > 
> > > > Memory  |1G RAM|cached|non cached
> > > > regions #   |  3   |  53  |  53
> > > > 
> > > > upstream| 0.3% |  -   | 3.5%
> > > > 
> > > > this series | 0.2% | 0.5% | 0.7%
> > > > 
> > > > where "non cached" column reflects trashing wokload
> > > > with constant cache miss. More details on timing in
> > > > respective patches.
> > > > 
> > > > Igor Mammedov (5):
> > > >   vhost: use binary search instead of linear in find_region()
> > > >   vhost: extend memory regions allocation to vmalloc
> > > >   vhost: support upto 509 memory regions
> > > >   vhost: add per VQ memory region caching
> > > >   vhost: translate_desc: optimization for desc.len < region size
> > > > 
> > > >  drivers/vhost/vhost.c | 95
> > > > +--
> > > > drivers/vhost/vhost.h |  1 + 2 files changed, 71 insertions(+),
> > > > 25 deletions(-)
> > > > 
> > > > -- 
> > > > 1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Michael S. Tsirkin

On Wed, Jun 17, 2015 at 09:28:02AM +0200, Igor Mammedov wrote:
> On Wed, 17 Jun 2015 08:34:26 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > On Wed, Jun 17, 2015 at 12:00:56AM +0200, Igor Mammedov wrote:
> > > On Tue, 16 Jun 2015 23:14:20 +0200
> > > "Michael S. Tsirkin"  wrote:
> > > 
> > > > On Tue, Jun 16, 2015 at 06:33:37PM +0200, Igor Mammedov wrote:
> > > > > since commit
> > > > >  1d4e7e3 kvm: x86: increase user memory slots to 509
> > > > > 
> > > > > it became possible to use a bigger amount of memory
> > > > > slots, which is used by memory hotplug for
> > > > > registering hotplugged memory.
> > > > > However QEMU crashes if it's used with more than ~60
> > > > > pc-dimm devices and vhost-net since host kernel
> > > > > in module vhost-net refuses to accept more than 65
> > > > > memory regions.
> > > > > 
> > > > > Increase VHOST_MEMORY_MAX_NREGIONS from 65 to 509
> > > > 
> > > > It was 64, not 65.
> > > > 
> > > > > to match KVM_USER_MEM_SLOTS fixes issue for vhost-net.
> > > > > 
> > > > > Signed-off-by: Igor Mammedov 
> > > > 
> > > > Still thinking about this: can you reorder this to
> > > > be the last patch in the series please?
> > > sure
> > > 
> > > > 
> > > > Also - 509?
> > > userspace memory slots in terms of KVM, I made it match
> > > KVM's allotment of memory slots for userspace side.
> > 
> > Maybe KVM has its reasons for this #. I don't see
> > why we need to match this exactly.
> np, I can cap it at safe 300 slots but it's unlikely that it
> would take cut off 1 extra hop since it's capped by QEMU
> at 256+[initial fragmented memory]

But what's the point? We allocate 32 bytes per slot.
300*32 = 9600 which is more than 8K, so we are doing
an order-3 allocation anyway.
If we could cap it at 8K (256 slots) that would make sense
since we could avoid wasting vmalloc space.

I'm still not very happy with the whole approach,
giving userspace ability allocate 4 whole pages
of kernel memory like this.

> > 
> > > > I think if we are changing this, it'd be nice to
> > > > create a way for userspace to discover the support
> > > > and the # of regions supported.
> > > That was my first idea before extending KVM's memslots
> > > to teach kernel to tell qemu this number so that QEMU
> > > at least would be able to check if new memory slot could
> > > be added but I was redirected to a more simple solution
> > > of just extending vs everdoing things.
> > > Currently QEMU supports upto ~250 memslots so 509
> > > is about twice high we need it so it should work for near
> > > future
> > 
> > Yes but old kernels are still around. Would be nice if you
> > can detect them.
> > 
> > > but eventually we might still teach kernel and QEMU
> > > to make things more robust.
> > 
> > A new ioctl would be easy to add, I think it's a good
> > idea generally.
> I can try to do something like this on top of this series.
> 
> > 
> > > > 
> > > > 
> > > > > ---
> > > > >  drivers/vhost/vhost.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > index 99931a0..6a18c92 100644
> > > > > --- a/drivers/vhost/vhost.c
> > > > > +++ b/drivers/vhost/vhost.c
> > > > > @@ -30,7 +30,7 @@
> > > > >  #include "vhost.h"
> > > > >  
> > > > >  enum {
> > > > > - VHOST_MEMORY_MAX_NREGIONS = 64,
> > > > > + VHOST_MEMORY_MAX_NREGIONS = 509,
> > > > >   VHOST_MEMORY_F_LOG = 0x1,
> > > > >  };
> > > > >  
> > > > > -- 
> > > > > 1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] vhost: support upto 509 memory regions

2015-06-17 Thread Michael S. Tsirkin

On Wed, Jun 17, 2015 at 09:33:57AM +0200, Igor Mammedov wrote:
> On Wed, 17 Jun 2015 08:31:23 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > On Wed, Jun 17, 2015 at 12:19:15AM +0200, Igor Mammedov wrote:
> > > On Tue, 16 Jun 2015 23:16:07 +0200
> > > "Michael S. Tsirkin"  wrote:
> > > 
> > > > On Tue, Jun 16, 2015 at 06:33:34PM +0200, Igor Mammedov wrote:
> > > > > Series extends vhost to support upto 509 memory regions,
> > > > > and adds some vhost:translate_desc() performance improvemnts
> > > > > so it won't regress when memslots are increased to 509.
> > > > > 
> > > > > It fixes running VM crashing during memory hotplug due
> > > > > to vhost refusing accepting more than 64 memory regions.
> > > > > 
> > > > > It's only host kernel side fix to make it work with QEMU
> > > > > versions that support memory hotplug. But I'll continue
> > > > > to work on QEMU side solution to reduce amount of memory
> > > > > regions to make things even better.
> > > > 
> > > > I'm concerned userspace work will be harder, in particular,
> > > > performance gains will be harder to measure.
> > > it appears so, so far.
> > > 
> > > > How about a flag to disable caching?
> > > I've tried to measure cost of cache miss but without much luck,
> > > difference between version with cache and with caching removed
> > > was within margin of error (±10ns) (i.e. not mensurable on my
> > > 5min/10*10^6 test workload).
> > 
> > Confused. I thought it was very much measureable.
> > So why add a cache if you can't measure its effect?
> I hasn't been able to measure immediate delta between function
> start/end with precision more than 10ns, perhaps used method
> (system tap) is to blame.
> But it's still possible to measure indirectly like 2% from 5/5.

Ah, makes sense.

> > 
> > > Also I'm concerned about adding extra fetch+branch for flag
> > > checking will make things worse for likely path of cache hit,
> > > so I'd avoid it if possible.
> > > 
> > > Or do you mean a simple global per module flag to disable it and
> > > wrap thing in static key so that it will be cheap jump to skip
> > > cache?
> > 
> > Something like this, yes.
> ok, will do.
> 
> > 
> > > > > Performance wise for guest with (in my case 3 memory regions)
> > > > > and netperf's UDP_RR workload translate_desc() execution
> > > > > time from total workload takes:
> > > > > 
> > > > > Memory  |1G RAM|cached|non cached
> > > > > regions #   |  3   |  53  |  53
> > > > > 
> > > > > upstream| 0.3% |  -   | 3.5%
> > > > > 
> > > > > this series | 0.2% | 0.5% | 0.7%
> > > > > 
> > > > > where "non cached" column reflects trashing wokload
> > > > > with constant cache miss. More details on timing in
> > > > > respective patches.
> > > > > 
> > > > > Igor Mammedov (5):
> > > > >   vhost: use binary search instead of linear in find_region()
> > > > >   vhost: extend memory regions allocation to vmalloc
> > > > >   vhost: support upto 509 memory regions
> > > > >   vhost: add per VQ memory region caching
> > > > >   vhost: translate_desc: optimization for desc.len < region size
> > > > > 
> > > > >  drivers/vhost/vhost.c | 95
> > > > > +--
> > > > > drivers/vhost/vhost.h |  1 + 2 files changed, 71 insertions(+),
> > > > > 25 deletions(-)
> > > > > 
> > > > > -- 
> > > > > 1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 17/18] x86/kvm/tsc: Drop extra barrier and use rdtsc_ordered in kvmclock

2015-06-17 Thread Paolo Bonzini



On 17/06/2015 02:36, Andy Lutomirski wrote:
> __pvclock_read_cycles had an unnecessary barrier.  Get rid of that
> barrier and clean up the code by just using rdtsc_ordered().
> 
> Cc: Paolo Bonzini 
> Cc: Radim Krcmar 
> Cc: Marcelo Tosatti 
> Cc: kvm@vger.kernel.org
> Signed-off-by: Andy Lutomirski 
> ---
> 
> I'm hoping to get an ack for this to go in through -tip.  (Arguably
> I'm the maintainer of this code given how it's used, but I should
> still ask for an ack.)
> 
> arch/x86/include/asm/pvclock.h | 21 -
>  1 file changed, 12 insertions(+), 9 deletions(-)

Can you send a URL to the rest of the series?  I've never even seen v1
or v2 so I have no idea of what this is about.

> diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
> index 6084bce345fc..cf2329ca4812 100644
> --- a/arch/x86/include/asm/pvclock.h
> +++ b/arch/x86/include/asm/pvclock.h
> @@ -62,7 +62,18 @@ static inline u64 pvclock_scale_delta(u64 delta, u32 
> mul_frac, int shift)
>  static __always_inline
>  u64 pvclock_get_nsec_offset(const struct pvclock_vcpu_time_info *src)
>  {
> - u64 delta = rdtsc() - src->tsc_timestamp;
> + /*
> +  * Note: emulated platforms which do not advertise SSE2 support
> +  * break rdtsc_ordered, resulting in kvmclock not using the
> +  * necessary RDTSC barriers.  Without barriers, it is possible
> +  * that RDTSC instruction is executed before prior loads,
> +  * resulting in violation of monotonicity.
> +  *
> +  * On an SMP guest without SSE2, it's unclear how anything is
> +  * supposed to work correctly, though -- memory fences
> +  * (e.g. smp_mb) are important for more than just timing.
> +  */

On an SMP guest without SSE2, memory fences are obtained with e.g. "lock
addb $0, (%esp)".

> + u64 delta = rdtsc_ordered() - src->tsc_timestamp;
>   return pvclock_scale_delta(delta, src->tsc_to_system_mul,
>  src->tsc_shift);
>  }
> @@ -76,17 +87,9 @@ unsigned __pvclock_read_cycles(const struct 
> pvclock_vcpu_time_info *src,
>   u8 ret_flags;
>  
>   version = src->version;
> - /* Note: emulated platforms which do not advertise SSE2 support
> -  * result in kvmclock not using the necessary RDTSC barriers.
> -  * Without barriers, it is possible that RDTSC instruction reads from
> -  * the time stamp counter outside rdtsc_barrier protected section
> -  * below, resulting in violation of monotonicity.
> -  */
> - rdtsc_barrier();
>   offset = pvclock_get_nsec_offset(src);
>   ret = src->system_time + offset;
>   ret_flags = src->flags;
> - rdtsc_barrier();
>  
>   *cycles = ret;
>   *flags = ret_flags;
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 3/4] KVM: x86: Add EOI exit bitmap inference

2015-06-17 Thread Paolo Bonzini



On 09/06/2015 04:16, Wanpeng Li wrote:
>>
>> So in the end the patched vcpu_scan_ioapic becomes
>>
>> if (kvm_apic_hw_enabled(vcpu->arch.apic))
> 
> s/kvm_apic_hw_enabled(vcpu->arch.apic)/!kvm_apic_hw_enabled(vcpu->arch.apic)

Right, thanks for the correction.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 09/13] KVM: x86: pass kvm_mmu_page to gfn_to_rmap

2015-06-17 Thread Paolo Bonzini



On 09/06/2015 05:28, Xiao Guangrong wrote:
>>
>> -rmapp = gfn_to_rmap(kvm, sp->gfn, PT_PAGE_TABLE_LEVEL);
>> +slots = kvm_memslots(kvm);
>> +slot = __gfn_to_memslot(slots, sp->gfn);
>> +rmapp = __gfn_to_rmap(sp->gfn, PT_PAGE_TABLE_LEVEL, slot);
>>
> 
> Why @sp is not available here?

Because the function forces the level to be PT_PAGE_TABLE_LEVEL rather
than sp->level.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 12/13] KVM: x86: add SMM to the MMU role, support SMRAM address space

2015-06-17 Thread Paolo Bonzini



On 09/06/2015 06:01, Xiao Guangrong wrote:
> 
> 
> On 05/28/2015 01:05 AM, Paolo Bonzini wrote:
>> This is now very simple to do.  The only interesting part is a simple
>> trick to find the right memslot in gfn_to_rmap, retrieving the address
>> space from the spte role word.  The same trick is used in the auditing
>> code.
>>
>> The comment on top of union kvm_mmu_page_role has been stale forever,
> 
> Fortunately, we have documented these fields in mmu.txt, please do it for
> 'smm' as well. :)

Right, done.

>> +/*
>> + * This is left at the top of the word so that
>> + * kvm_memslots_for_spte_role can extract it with a
>> + * simple shift.  While there is room, give it a whole
>> + * byte so it is also faster to load it from memory.
>> + */
>> +unsigned smm:8;
> 
> I suspect if we really need this trick, smm is not the hottest filed in
> this struct anyway.

Note that after these patches it is used by gfn_to_rmap, and hence for
example rmap_add.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Paolo Bonzini



On 17/06/2015 08:34, Michael S. Tsirkin wrote:
>>> > > 
>>> > > Also - 509?
>> > userspace memory slots in terms of KVM, I made it match
>> > KVM's allotment of memory slots for userspace side.
> Maybe KVM has its reasons for this #.

Nice power of two (512) - number of reserved slots required by Intel's
virtualization extensions (3: APIC access page, EPT identity page table,
VMX task state segment).

Paolo

 I don't see
> why we need to match this exactly.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Igor Mammedov

On Wed, 17 Jun 2015 09:39:06 +0200
"Michael S. Tsirkin"  wrote:

> On Wed, Jun 17, 2015 at 09:28:02AM +0200, Igor Mammedov wrote:
> > On Wed, 17 Jun 2015 08:34:26 +0200
> > "Michael S. Tsirkin"  wrote:
> > 
> > > On Wed, Jun 17, 2015 at 12:00:56AM +0200, Igor Mammedov wrote:
> > > > On Tue, 16 Jun 2015 23:14:20 +0200
> > > > "Michael S. Tsirkin"  wrote:
> > > > 
> > > > > On Tue, Jun 16, 2015 at 06:33:37PM +0200, Igor Mammedov wrote:
> > > > > > since commit
> > > > > >  1d4e7e3 kvm: x86: increase user memory slots to 509
> > > > > > 
> > > > > > it became possible to use a bigger amount of memory
> > > > > > slots, which is used by memory hotplug for
> > > > > > registering hotplugged memory.
> > > > > > However QEMU crashes if it's used with more than ~60
> > > > > > pc-dimm devices and vhost-net since host kernel
> > > > > > in module vhost-net refuses to accept more than 65
> > > > > > memory regions.
> > > > > > 
> > > > > > Increase VHOST_MEMORY_MAX_NREGIONS from 65 to 509
> > > > > 
> > > > > It was 64, not 65.
> > > > > 
> > > > > > to match KVM_USER_MEM_SLOTS fixes issue for vhost-net.
> > > > > > 
> > > > > > Signed-off-by: Igor Mammedov 
> > > > > 
> > > > > Still thinking about this: can you reorder this to
> > > > > be the last patch in the series please?
> > > > sure
> > > > 
> > > > > 
> > > > > Also - 509?
> > > > userspace memory slots in terms of KVM, I made it match
> > > > KVM's allotment of memory slots for userspace side.
> > > 
> > > Maybe KVM has its reasons for this #. I don't see
> > > why we need to match this exactly.
> > np, I can cap it at safe 300 slots but it's unlikely that it
> > would take cut off 1 extra hop since it's capped by QEMU
> > at 256+[initial fragmented memory]
> 
> But what's the point? We allocate 32 bytes per slot.
> 300*32 = 9600 which is more than 8K, so we are doing
> an order-3 allocation anyway.
> If we could cap it at 8K (256 slots) that would make sense
> since we could avoid wasting vmalloc space.
256 is amount of hotpluggable slots  and there is no way
to predict how initial memory would be fragmented
(i.e. amount of slots it would take), if we guess wrong
we are back to square one with crashing userspace.
So I'd stay consistent with KVM's limit 509 since
it's only limit, i.e. not actual amount of allocated slots.

> I'm still not very happy with the whole approach,
> giving userspace ability allocate 4 whole pages
> of kernel memory like this.
I'm working in parallel so that userspace won't take so
many slots but it won't prevent its current versions
crashing due to kernel limitation.

 
> > > > > I think if we are changing this, it'd be nice to
> > > > > create a way for userspace to discover the support
> > > > > and the # of regions supported.
> > > > That was my first idea before extending KVM's memslots
> > > > to teach kernel to tell qemu this number so that QEMU
> > > > at least would be able to check if new memory slot could
> > > > be added but I was redirected to a more simple solution
> > > > of just extending vs everdoing things.
> > > > Currently QEMU supports upto ~250 memslots so 509
> > > > is about twice high we need it so it should work for near
> > > > future
> > > 
> > > Yes but old kernels are still around. Would be nice if you
> > > can detect them.
> > > 
> > > > but eventually we might still teach kernel and QEMU
> > > > to make things more robust.
> > > 
> > > A new ioctl would be easy to add, I think it's a good
> > > idea generally.
> > I can try to do something like this on top of this series.
> > 
> > > 
> > > > > 
> > > > > 
> > > > > > ---
> > > > > >  drivers/vhost/vhost.c | 2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > > index 99931a0..6a18c92 100644
> > > > > > --- a/drivers/vhost/vhost.c
> > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > @@ -30,7 +30,7 @@
> > > > > >  #include "vhost.h"
> > > > > >  
> > > > > >  enum {
> > > > > > -   VHOST_MEMORY_MAX_NREGIONS = 64,
> > > > > > +   VHOST_MEMORY_MAX_NREGIONS = 509,
> > > > > > VHOST_MEMORY_F_LOG = 0x1,
> > > > > >  };
> > > > > >  
> > > > > > -- 
> > > > > > 1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] arm64/kvm: Add generic v8 KVM target

2015-06-17 Thread Suzuki K. Poulose

From: "Suzuki K. Poulose" 

This patch adds a generic ARM v8 KVM target cpu type for use
by the new CPUs which eventualy ends up using the common sys_reg
table. For backward compatibility the existing targets have been
preserved. Any new target CPU that can be covered by generic v8
sys_reg tables should make use of the new generic target.

Signed-off-by: Suzuki K. Poulose 
Acked-by: Marc Zyngier 
---
 arch/arm64/include/uapi/asm/kvm.h|   10 --
 arch/arm64/kvm/guest.c   |3 ++-
 arch/arm64/kvm/sys_regs_generic_v8.c |2 ++
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index d268320..f5de418 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -53,14 +53,20 @@ struct kvm_regs {
struct user_fpsimd_state fp_regs;
 };
 
-/* Supported Processor Types */
+/*
+ * Supported CPU Targets - Adding a new target type is not recommended,
+ * unless there are some special registers not supported by the
+ * genericv8 syreg table.
+ */
 #define KVM_ARM_TARGET_AEM_V8  0
 #define KVM_ARM_TARGET_FOUNDATION_V8   1
 #define KVM_ARM_TARGET_CORTEX_A57  2
 #define KVM_ARM_TARGET_XGENE_POTENZA   3
 #define KVM_ARM_TARGET_CORTEX_A53  4
+/* Generic ARM v8 target */
+#define KVM_ARM_TARGET_GENERIC_V8  5
 
-#define KVM_ARM_NUM_TARGETS5
+#define KVM_ARM_NUM_TARGETS6
 
 /* KVM_ARM_SET_DEVICE_ADDR ioctl id encoding */
 #define KVM_ARM_DEVICE_TYPE_SHIFT  0
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 9535bd5..124aa57 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -293,7 +293,8 @@ int __attribute_const__ kvm_target_cpu(void)
break;
};
 
-   return -EINVAL;
+   /* Return a default generic target */
+   return KVM_ARM_TARGET_GENERIC_V8;
 }
 
 int kvm_vcpu_preferred_target(struct kvm_vcpu_init *init)
diff --git a/arch/arm64/kvm/sys_regs_generic_v8.c 
b/arch/arm64/kvm/sys_regs_generic_v8.c
index 475fd29..1e45768 100644
--- a/arch/arm64/kvm/sys_regs_generic_v8.c
+++ b/arch/arm64/kvm/sys_regs_generic_v8.c
@@ -94,6 +94,8 @@ static int __init sys_reg_genericv8_init(void)
  &genericv8_target_table);
kvm_register_target_sys_reg_table(KVM_ARM_TARGET_XGENE_POTENZA,
  &genericv8_target_table);
+   kvm_register_target_sys_reg_table(KVM_ARM_TARGET_GENERIC_V8,
+ &genericv8_target_table);
 
return 0;
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: IRQFD support with GICv3 ITS (WAS: RE: [PATCH 00/13] arm64: KVM: GICv3 ITS emulation)

2015-06-17 Thread Pavel Fedin

 PING!
 The discussion has suddenly stopped... What is our status? Is ITS v2 patch 
being
developed, or what? And do we have some conclusion on irqfd ?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf 
> Of Pavel
> Fedin
> Sent: Wednesday, June 10, 2015 6:30 PM
> To: 'Eric Auger'; 'Marc Zyngier'; 'Andre Przywara'; 
> christoffer.d...@linaro.org
> Cc: kvm...@lists.cs.columbia.edu; linux-arm-ker...@lists.infradead.org;
kvm@vger.kernel.org
> Subject: RE: IRQFD support with GICv3 ITS (WAS: RE: [PATCH 00/13] arm64: KVM: 
> GICv3 ITS
> emulation)
> 
>  Hi!
> 
> > indeed in newly added qemu kvm-all.c kvm_arch_msi_data_to_gsi we could
> > call a new ioctl that translates the data + deviceid? into an LPI and
> > program irqfd with that LPI. This is done once when setting irqfd up.
> > This also means extending irqfd support to lpi injection, gsi being the
> > LPI index if gsi >= 8192. in that case we continue using
> > kvm_gsi_direct_mapping and gsi still is an IRQ index.
> 
>  This is exactly what i have done in my kernel + qemu. I have added a new KVM 
> capability
> and then in qemu i do this:
> --- cut ---
> if (kvm_gsi_kernel_mapping()) {
> struct kvm_msi msi;
> 
> msi.address_lo = (uint32_t)msg.address;
> msi.address_hi = msg.address >> 32;
> msi.data = le32_to_cpu(msg.data);
> memset(msi.pad, 0, sizeof(msi.pad));
> 
> if (dev) {
> msi.devid = (pci_bus_num(dev->bus) << 8) | dev->devfn;
> msi.flags = KVM_MSI_VALID_DEVID;
> } else {
> msi.devid = 0;
> msi.flags = 0;
> }
> 
> return kvm_vm_ioctl(s, KVM_TRANSLATE_MSI, &msi);
> }
> --- cut ---
>  KVM_TRANSLATE_MSI returns an LPI number. This seemed to be the simplest and 
> fastest
thing
> to do.
>  If someone is interested, i could prepare an RFC patch series for this, 
> which would
apply
> on top of Andre's ITS implementation.
> 
> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 01/18] x86/tsc: Inline native_read_tsc and remove __native_read_tsc

2015-06-17 Thread Borislav Petkov

On Tue, Jun 16, 2015 at 05:35:49PM -0700, Andy Lutomirski wrote:
> In cdc7957d1954 ("x86: move native_read_tsc() offline"),
> native_read_tsc was moved out of line, presumably for some
> now-obsolete vDSO-related reason.  Undo it.
> 
> The entire rdtsc, shl, or sequence is only 11 bytes, and calls via
> rdtscl and similar helpers were already inlined.
> 
> Signed-off-by: Andy Lutomirski 
> ---
>  arch/x86/entry/vdso/vclock_gettime.c  | 2 +-
>  arch/x86/include/asm/msr.h| 8 +++-
>  arch/x86/include/asm/pvclock.h| 2 +-
>  arch/x86/include/asm/stackprotector.h | 2 +-
>  arch/x86/include/asm/tsc.h| 2 +-
>  arch/x86/kernel/apb_timer.c   | 4 ++--
>  arch/x86/kernel/tsc.c | 6 --
>  7 files changed, 9 insertions(+), 17 deletions(-)

Acked-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] arm: KVM: force execution of HCPTR access on VM exit

2015-06-17 Thread Marc Zyngier

On VM entry, we disable access to the VFP registers in order to
perform a lazy save/restore of these registers.

On VM exit, we restore access, test if we did enable them before,
and save/restore the guest/host registers if necessary. In this
sequence, the FPEXC register is always accessed, irrespective
of the trapping configuration.

If the guest didn't touch the VFP registers, then the HCPTR access
has now enabled such access, but we're missing a barrier to ensure
architectural execution of the new HCPTR configuration. If the HCPTR
access has been delayed/reordered, the subsequent access to FPEXC
will cause a trap, which we aren't prepared to handle at all.

The same condition exists when trapping to enable VFP for the guest.

The fix is to introduce a barrier after enabling VFP access. In the
vmexit case, it can be relaxed to only takes place if the guest hasn't
accessed its view of the VFP registers, making the access to FPEXC safe.

The set_hcptr macro is modified to deal with both vmenter/vmexit and
vmtrap operations, and now takes an optional label that is branched to
when the guest hasn't touched the VFP registers.

Reported-by: Vikram Sethi 
Cc: sta...@kernel.org   # v3.9+
Signed-off-by: Marc Zyngier 
---
* From v1:
  - Changed from a discrete fix to be integrated in set_hcptr
  - Also introduce an ISB on vmtrap (reported by Vikram)
  - Dropped Christoffer Reviewed-by, due to significant changes

 arch/arm/kvm/interrupts.S  | 10 --
 arch/arm/kvm/interrupts_head.S | 20 ++--
 2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 79caf79..f7db3a5 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -170,13 +170,9 @@ __kvm_vcpu_return:
@ Don't trap coprocessor accesses for host kernel
set_hstr vmexit
set_hdcr vmexit
-   set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
+   set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)), 
after_vfp_restore
 
 #ifdef CONFIG_VFPv3
-   @ Save floating point registers we if let guest use them.
-   tst r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
-   bne after_vfp_restore
-
@ Switch VFP/NEON hardware state to the host's
add r7, vcpu, #VCPU_VFP_GUEST
store_vfp_state r7
@@ -188,6 +184,8 @@ after_vfp_restore:
@ Restore FPEXC_EN which we clobbered on entry
pop {r2}
VFPFMXR FPEXC, r2
+#else
+after_vfp_restore:
 #endif
 
@ Reset Hyp-role
@@ -483,7 +481,7 @@ switch_to_guest_vfp:
push{r3-r7}
 
@ NEON/VFP used.  Turn on VFP access.
-   set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11))
+   set_hcptr vmtrap, (HCPTR_TCP(10) | HCPTR_TCP(11))
 
@ Switch VFP/NEON hardware state to the guest's
add r7, r0, #VCPU_VFP_HOST
diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
index 35e4a3a..48efe2e 100644
--- a/arch/arm/kvm/interrupts_head.S
+++ b/arch/arm/kvm/interrupts_head.S
@@ -591,8 +591,13 @@ ARM_BE8(revr6, r6  )
 .endm
 
 /* Configures the HCPTR (Hyp Coprocessor Trap Register) on entry/return
- * (hardware reset value is 0). Keep previous value in r2. */
-.macro set_hcptr operation, mask
+ * (hardware reset value is 0). Keep previous value in r2.
+ * An ISB is emited on vmexit/vmtrap, but executed on vmexit only if
+ * VFP wasn't already enabled (always executed on vmtrap).
+ * If a label is specified with vmexit, it is branched to if VFP wasn't
+ * enabled.
+ */
+.macro set_hcptr operation, mask, label = none
mrc p15, 4, r2, c1, c1, 2
ldr r3, =\mask
.if \operation == vmentry
@@ -601,6 +606,17 @@ ARM_BE8(revr6, r6  )
bic r3, r2, r3  @ Don't trap defined coproc-accesses
.endif
mcr p15, 4, r3, c1, c1, 2
+   .if \operation != vmentry
+   .if \operation == vmexit
+   tst r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
+   beq 1f
+   .endif
+   isb
+   .if \label != none
+   b   \label
+   .endif
+1:
+   .endif
 .endm
 
 /* Configures the HDCR (Hyp Debug Configuration Register) on entry/return
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: IRQFD support with GICv3 ITS (WAS: RE: [PATCH 00/13] arm64: KVM: GICv3 ITS emulation)

2015-06-17 Thread Marc Zyngier

On 17/06/15 10:21, Pavel Fedin wrote:
>  PING!
>  The discussion has suddenly stopped... What is our status? Is ITS v2 patch 
> being
> developed, or what? And do we have some conclusion on irqfd ?

Hmmm. You may not have noticed it, but we're actually all are quite busy
at the moment (hint, we're at -rc8, and the next merge window is about
to open).

As for the state of the ITS, it is still in review, and I expect Andre
will post an updated series after the merge window.

As for your irqfd proposal, see below:

> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia
> 
> 
>> -Original Message-
>> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf 
>> Of Pavel
>> Fedin
>> Sent: Wednesday, June 10, 2015 6:30 PM
>> To: 'Eric Auger'; 'Marc Zyngier'; 'Andre Przywara'; 
>> christoffer.d...@linaro.org
>> Cc: kvm...@lists.cs.columbia.edu; linux-arm-ker...@lists.infradead.org;
> kvm@vger.kernel.org
>> Subject: RE: IRQFD support with GICv3 ITS (WAS: RE: [PATCH 00/13] arm64: 
>> KVM: GICv3 ITS
>> emulation)
>>
>>  Hi!
>>
>>> indeed in newly added qemu kvm-all.c kvm_arch_msi_data_to_gsi we could
>>> call a new ioctl that translates the data + deviceid? into an LPI and
>>> program irqfd with that LPI. This is done once when setting irqfd up.
>>> This also means extending irqfd support to lpi injection, gsi being the
>>> LPI index if gsi >= 8192. in that case we continue using
>>> kvm_gsi_direct_mapping and gsi still is an IRQ index.
>>
>>  This is exactly what i have done in my kernel + qemu. I have added a new 
>> KVM capability
>> and then in qemu i do this:
>> --- cut ---
>> if (kvm_gsi_kernel_mapping()) {
>> struct kvm_msi msi;
>>
>> msi.address_lo = (uint32_t)msg.address;
>> msi.address_hi = msg.address >> 32;
>> msi.data = le32_to_cpu(msg.data);
>> memset(msi.pad, 0, sizeof(msi.pad));
>>
>> if (dev) {
>> msi.devid = (pci_bus_num(dev->bus) << 8) | dev->devfn;
>> msi.flags = KVM_MSI_VALID_DEVID;
>> } else {
>> msi.devid = 0;
>> msi.flags = 0;
>> }
>>
>> return kvm_vm_ioctl(s, KVM_TRANSLATE_MSI, &msi);
>> }
>> --- cut ---
>>  KVM_TRANSLATE_MSI returns an LPI number. This seemed to be the simplest and 
>> fastest
> thing
>> to do.
>>  If someone is interested, i could prepare an RFC patch series for this, 
>> which would
> apply
>> on top of Andre's ITS implementation.

This feels just wrong. The LPI number is under complete control of the
guest, and can be changed at any time. You can never rely on it to be
stable.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 02/18] x86/msr/kvm: Remove vget_cycles()

2015-06-17 Thread Borislav Petkov

On Tue, Jun 16, 2015 at 05:35:50PM -0700, Andy Lutomirski wrote:
> The only caller was kvm's read_tsc.  The only difference between
> vget_cycles and native_read_tsc was that vget_cycles returned zero
> instead of crashing on TSC-less systems.  KVM's already checks
> vclock_mode before calling that function, so the extra check is
> unnecessary.
> 
> (Off-topic, but the whole KVM clock host implementation is gross.
>  IMO it should be rewritten.)
> 
> Signed-off-by: Andy Lutomirski 
> ---
>  arch/x86/include/asm/tsc.h | 13 -
>  arch/x86/kvm/x86.c |  2 +-
>  2 files changed, 1 insertion(+), 14 deletions(-)

Acked-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/3] kvmtool: fixes for PowerPC

2015-06-17 Thread Andre Przywara

Hello,

some patches to fix at least the build of the new kvmtool for
PowerPC. I could only compile test it so far, so I'd be grateful
if people more familiar with that architecture can have a look
and maybe even test it on actual machines.

Cheers,
Andre.

Andre Przywara (3):
  powerpc: implement barrier primitives
  powerpc: use default endianness for converting guest/init
  powerpc: add hvcall.h header from Linux

 Makefile  |   1 -
 powerpc/include/asm/hvcall.h  | 287 ++
 powerpc/include/kvm/barrier.h |   4 +-
 powerpc/spapr.h   |   3 -
 4 files changed, 290 insertions(+), 5 deletions(-)
 create mode 100644 powerpc/include/asm/hvcall.h

-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] powerpc: use default endianness for converting guest/init

2015-06-17 Thread Andre Przywara

For converting the guest/init binary into an object file, we call
the linker binary, setting the endianness to big endian explicitly
when compiling kvmtool for powerpc.
This breaks if the compiler is actually targetting little endian
(which is true for the Debian port, for instance).
Remove the explicit big endianness switch from the linker call to
allow linking on little endian PowerPC builds again.

Signed-off-by: Andre Przywara 
---
Hi,

this fixed the powerpc64le build for me, while still compiling fine
for big endian. Admittedly this whole init->guest_init.o conversion
has its issues (with MIPS, for instance), which deserve proper fixing,
but lets just fix that build for now.

Andre.

 Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/Makefile b/Makefile
index 6110b8e..c118e1a 100644
--- a/Makefile
+++ b/Makefile
@@ -149,7 +149,6 @@ ifeq ($(ARCH), powerpc)
OBJS+= powerpc/xics.o
ARCH_INCLUDE := powerpc/include
CFLAGS  += -m64
-   LDFLAGS += -m elf64ppc
 
ARCH_WANT_LIBFDT := y
 endif
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] powerpc: add hvcall.h header from Linux

2015-06-17 Thread Andre Przywara

The powerpc code uses some PAPR hypercalls, of which we need the
hypercall number. Copy the macro definition parts from the kernel's
(private) hvcall.h file and remove the extra tricks formerly used
to be able to include this header file directly.

Signed-off-by: Andre Przywara 
---
Hi,

I copied most of the Linux header, without removing
definitions that kvmtool doesn't use. That should make updates
easier. If people would prefer a bespoke header, let me know.

Andre.

 powerpc/include/asm/hvcall.h | 287 +++
 powerpc/spapr.h  |   3 -
 2 files changed, 287 insertions(+), 3 deletions(-)
 create mode 100644 powerpc/include/asm/hvcall.h

diff --git a/powerpc/include/asm/hvcall.h b/powerpc/include/asm/hvcall.h
new file mode 100644
index 000..b6dc250
--- /dev/null
+++ b/powerpc/include/asm/hvcall.h
@@ -0,0 +1,287 @@
+#ifndef _ASM_POWERPC_HVCALL_H
+#define _ASM_POWERPC_HVCALL_H
+
+#define HVSC   .long 0x4422
+
+#define H_SUCCESS  0
+#define H_BUSY 1   /* Hardware busy -- retry later */
+#define H_CLOSED   2   /* Resource closed */
+#define H_NOT_AVAILABLE 3
+#define H_CONSTRAINED  4   /* Resource request constrained to max allowed 
*/
+#define H_PARTIAL   5
+#define H_IN_PROGRESS  14  /* Kind of like busy */
+#define H_PAGE_REGISTERED 15
+#define H_PARTIAL_STORE   16
+#define H_PENDING  17  /* returned from H_POLL_PENDING */
+#define H_CONTINUE 18  /* Returned from H_Join on success */
+#define H_LONG_BUSY_START_RANGE9900  /* Start of long busy 
range */
+#define H_LONG_BUSY_ORDER_1_MSEC   9900  /* Long busy, hint that 1msec \
+is a good time to retry */
+#define H_LONG_BUSY_ORDER_10_MSEC  9901  /* Long busy, hint that 10msec \
+is a good time to retry */
+#define H_LONG_BUSY_ORDER_100_MSEC 9902  /* Long busy, hint that 100msec \
+is a good time to retry */
+#define H_LONG_BUSY_ORDER_1_SEC9903  /* Long busy, hint that 
1sec \
+is a good time to retry */
+#define H_LONG_BUSY_ORDER_10_SEC   9904  /* Long busy, hint that 10sec \
+is a good time to retry */
+#define H_LONG_BUSY_ORDER_100_SEC  9905  /* Long busy, hint that 100sec \
+is a good time to retry */
+#define H_LONG_BUSY_END_RANGE  9905  /* End of long busy range */
+
+/* Internal value used in book3s_hv kvm support; not returned to guests */
+#define H_TOO_HARD 
+
+#define H_HARDWARE -1  /* Hardware error */
+#define H_FUNCTION -2  /* Function not supported */
+#define H_PRIVILEGE-3  /* Caller not privileged */
+#define H_PARAMETER-4  /* Parameter invalid, out-of-range or 
conflicting */
+#define H_BAD_MODE -5  /* Illegal msr value */
+#define H_PTEG_FULL-6  /* PTEG is full */
+#define H_NOT_FOUND-7  /* PTE was not found" */
+#define H_RESERVED_DABR-8  /* DABR address is reserved by the 
hypervisor on this processor" */
+#define H_NO_MEM   -9
+#define H_AUTHORITY-10
+#define H_PERMISSION   -11
+#define H_DROPPED  -12
+#define H_SOURCE_PARM  -13
+#define H_DEST_PARM-14
+#define H_REMOTE_PARM  -15
+#define H_RESOURCE -16
+#define H_ADAPTER_PARM  -17
+#define H_RH_PARM   -18
+#define H_RCQ_PARM  -19
+#define H_SCQ_PARM  -20
+#define H_EQ_PARM   -21
+#define H_RT_PARM   -22
+#define H_ST_PARM   -23
+#define H_SIGT_PARM -24
+#define H_TOKEN_PARM-25
+#define H_MLENGTH_PARM  -27
+#define H_MEM_PARM  -28
+#define H_MEM_ACCESS_PARM -29
+#define H_ATTR_PARM -30
+#define H_PORT_PARM -31
+#define H_MCG_PARM  -32
+#define H_VL_PARM   -33
+#define H_TSIZE_PARM-34
+#define H_TRACE_PARM-35
+
+#define H_MASK_PARM -37
+#define H_MCG_FULL  -38
+#define H_ALIAS_EXIST   -39
+#define H_P_COUNTER -40
+#define H_TABLE_FULL-41
+#define H_ALT_TABLE -42
+#define H_MR_CONDITION  -43
+#define H_NOT_ENOUGH_RESOURCES -44
+#define H_R_STATE   -45
+#define H_RESCINDED -46
+#define H_P2   -55
+#define H_P3   -56
+#define H_P4   -57
+#define H_P5   -58
+#define H_P6   -59
+#define H_P7   -60
+#define H_P8   -61
+#define H_P9   -62
+#define H_TOO_BIG  -64
+#define H_OVERLAP  -68
+#define H_INTERRUPT-69
+#define H_BAD_DATA -70
+#define H_NOT_ACTIVE   -71
+#define H_SG_LIST  -72
+#define H_OP_MODE  -73
+#define H_COP_HW   -74
+#define H_UNSUPPORTED_FLAG_START   -256
+#define H_UNSUPPORTED_FLAG_END -511
+#define H_MULTI_THREADS_ACTIVE -9005
+#define H_OUTSTANDING_COP_OPS  -9006
+
+
+/* Long Busy is a condition that can be returned by the firmware

[PATCH 1/3] powerpc: implement barrier primitives

2015-06-17 Thread Andre Przywara

Instead of referring to the Linux header including the barrier
macros, copy over the rather simple implementation for the PowerPC
barrier instructions kvmtool uses. This fixes build for powerpc.

Signed-off-by: Andre Przywara 
---
Hi,

I just took what kvmtool seems to have used before, I actually have
no idea if "sync" is the right instruction or "lwsync" would do.
Would be nice if some people with PowerPC knowledge could comment.

Cheers,
Andre.

 powerpc/include/kvm/barrier.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/powerpc/include/kvm/barrier.h b/powerpc/include/kvm/barrier.h
index dd5115a..4b708ae 100644
--- a/powerpc/include/kvm/barrier.h
+++ b/powerpc/include/kvm/barrier.h
@@ -1,6 +1,8 @@
 #ifndef _KVM_BARRIER_H_
 #define _KVM_BARRIER_H_
 
-#include 
+#define mb()   asm volatile ("sync" : : : "memory")
+#define rmb()  asm volatile ("sync" : : : "memory")
+#define wmb()  asm volatile ("sync" : : : "memory")
 
 #endif /* _KVM_BARRIER_H_ */
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 03/18] x86/tsc/paravirt: Remove the read_tsc and read_tscp paravirt hooks

2015-06-17 Thread Borislav Petkov

+ paravirt list.

On Tue, Jun 16, 2015 at 05:35:51PM -0700, Andy Lutomirski wrote:
> We've had read_tsc and read_tscp paravirt hooks since the very
> beginning of paravirt, i.e., d3561b7fa0fb ("[PATCH] paravirt: header
> and stubs for paravirtualisation").  AFAICT the only paravirt guest
> implementation that ever replaced these calls was vmware, and it's
> gone.  Arguably even vmware shouldn't have hooked rdtsc -- we fully
> support systems that don't have a TSC at all, so there's no point
> for a paravirt implementation to pretend that we have a TSC but to
> replace it.
> 
> I also doubt that these hooks actually worked.  Calls to rdtscl and
> rdtscll, which respected the hooks, were used seemingly
> interchangeably with native_read_tsc, which did not.
> 
> Just remove them.  If anyone ever needs them again, they can try
> to make a case for why they need them.
> 
> Before, on a paravirt config:
>text  data bss dec hex filename
> 13426505  1827056 14508032297615931c62039 vmlinux
> 
> After:
>text  data bss dec hex filename
> 13426617  1827056 14508032297617051c620a9 vmlinux
> 
> Signed-off-by: Andy Lutomirski 
> ---
>  arch/x86/include/asm/msr.h| 16 
>  arch/x86/include/asm/paravirt.h   | 34 --
>  arch/x86/include/asm/paravirt_types.h |  2 --
>  arch/x86/kernel/paravirt.c|  2 --
>  arch/x86/kernel/paravirt_patch_32.c   |  2 --
>  arch/x86/xen/enlighten.c  |  3 ---
>  6 files changed, 8 insertions(+), 51 deletions(-)

Nice diffstat.

Acked-by: Borislav Petkov 

(leaving in the rest for reference)

> diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
> index 88711470af7f..d1afac7df484 100644
> --- a/arch/x86/include/asm/msr.h
> +++ b/arch/x86/include/asm/msr.h
> @@ -178,12 +178,6 @@ static inline int rdmsrl_safe(unsigned msr, unsigned 
> long long *p)
>   return err;
>  }
>  
> -#define rdtscl(low)  \
> - ((low) = (u32)native_read_tsc())
> -
> -#define rdtscll(val) \
> - ((val) = native_read_tsc())
> -
>  #define rdpmc(counter, low, high)\
>  do { \
>   u64 _l = native_read_pmc((counter));\
> @@ -193,6 +187,14 @@ do { 
> \
>  
>  #define rdpmcl(counter, val) ((val) = native_read_pmc(counter))
>  
> +#endif   /* !CONFIG_PARAVIRT */
> +
> +#define rdtscl(low)  \
> + ((low) = (u32)native_read_tsc())
> +
> +#define rdtscll(val) \
> + ((val) = native_read_tsc())
> +
>  #define rdtscp(low, high, aux)   \
>  do {\
>   unsigned long long _val = native_read_tscp(&(aux)); \
> @@ -202,8 +204,6 @@ do {  
>   \
>  
>  #define rdtscpll(val, aux) (val) = native_read_tscp(&(aux))
>  
> -#endif   /* !CONFIG_PARAVIRT */
> -
>  /*
>   * 64-bit version of wrmsr_safe():
>   */
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index d143bfad45d7..c2be0375bcad 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -174,19 +174,6 @@ static inline int rdmsrl_safe(unsigned msr, unsigned 
> long long *p)
>   return err;
>  }
>  
> -static inline u64 paravirt_read_tsc(void)
> -{
> - return PVOP_CALL0(u64, pv_cpu_ops.read_tsc);
> -}
> -
> -#define rdtscl(low)  \
> -do { \
> - u64 _l = paravirt_read_tsc();   \
> - low = (int)_l;  \
> -} while (0)
> -
> -#define rdtscll(val) (val = paravirt_read_tsc())
> -
>  static inline unsigned long long paravirt_sched_clock(void)
>  {
>   return PVOP_CALL0(unsigned long long, pv_time_ops.sched_clock);
> @@ -215,27 +202,6 @@ do { \
>  
>  #define rdpmcl(counter, val) ((val) = paravirt_read_pmc(counter))
>  
> -static inline unsigned long long paravirt_rdtscp(unsigned int *aux)
> -{
> - return PVOP_CALL1(u64, pv_cpu_ops.read_tscp, aux);
> -}
> -
> -#define rdtscp(low, high, aux)   \
> -do { \
> - int __aux;  \
> - unsigned long __val = paravirt_rdtscp(&__aux);  \
> - (low) = (u32)__val; \
> - (high) = (u32)(__val >> 32);\
> - (aux) = __aux;  \
> -} while (0)
> -
> -#define rdtscpll(val, aux)   \
> -do { \
> - unsigned long

RE: IRQFD support with GICv3 ITS (WAS: RE: [PATCH 00/13] arm64: KVM: GICv3 ITS emulation)

2015-06-17 Thread Pavel Fedin

 Hello!

> Hmmm. You may not have noticed it, but we're actually all are quite busy
> at the moment (hint, we're at -rc8, and the next merge window is about
> to open).

 Ok ok, i do not mind of course. :) Just i expected at least some, quick reply. 
It's like
talking to a person while he/she suddenly starts ignoring you and turns away 
without any
ACK/NAK. I simply do not know what is wrong.

> This feels just wrong. The LPI number is under complete control of the
> guest, and can be changed at any time. You can never rely on it to be
> stable.

 Heh... Then i'm afraid the only option is the second one: GSI routing. It 
would allow to
associate an irqfd with MSI bunch (data + address + devID) as it is.
 I'm also currently busy with some strange vhost-net performance issues, so 
i'll make
another RFC later, after i redo my implementation using routing.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/5] kvmtool: Save datamatch as little endian in {add,del}_event

2015-06-17 Thread Will Deacon

On Wed, Jun 17, 2015 at 08:17:49AM +0100, Andreas Herrmann wrote:
> On Tue, Jun 16, 2015 at 06:17:14PM +0100, Will Deacon wrote:
> > On Mon, Jun 15, 2015 at 12:49:45PM +0100, Andreas Herrmann wrote:
> > > W/o dedicated endianess it's impossible to find reliably a match
> > > e.g. in kernel/virt/kvm/eventfd.c ioeventfd_in_range.
> > 
> > Hmm, but shouldn't this be the endianness of the guest, rather than just
> > forcing things to little-endian?
> 
> With my patch and following adaption to
> ioeventfd_in_range (in virt/kvm/eventfd.c):

[...]

> But now I see, w/o a correponding kernel change the patch shouldn't
> be merged.

Digging a bit deeper, I think it's up to the architecture KVM backend
(in the kernel) to present the mmio buffer to core kvm in the host
endianness.

For example, on ARM, we honour the endianness of the vcpu in
vcpu_data_guest_to_host when we populate the buffer for kvm_io_bus_write
(which is what ends up in the ioeventfd code).

I couldn't find equivalent code for MIPs, but I may have been looking in
the wrong place.

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 04/18] x86/tsc: Replace rdtscll with native_read_tsc

2015-06-17 Thread Borislav Petkov

On Tue, Jun 16, 2015 at 05:35:52PM -0700, Andy Lutomirski wrote:
> Now that the read_tsc paravirt hook is gone, rdtscll() is just a
> wrapper around native_read_tsc().  Unwrap it.
> 
> Signed-off-by: Andy Lutomirski 
> ---
>  arch/x86/boot/compressed/aslr.c  | 2 +-
>  arch/x86/include/asm/msr.h   | 3 ---
>  arch/x86/include/asm/tsc.h   | 5 +
>  arch/x86/kernel/apb_timer.c  | 4 ++--
>  arch/x86/kernel/apic/apic.c  | 8 
>  arch/x86/kernel/cpu/mcheck/mce.c | 4 ++--
>  arch/x86/kernel/espfix_64.c  | 2 +-
>  arch/x86/kernel/hpet.c   | 4 ++--
>  arch/x86/kernel/trace_clock.c| 2 +-
>  arch/x86/kernel/tsc.c| 4 ++--
>  arch/x86/kvm/vmx.c   | 2 +-
>  arch/x86/lib/delay.c | 2 +-
>  drivers/thermal/intel_powerclamp.c   | 4 ++--
>  tools/power/cpupower/debug/kernel/cpufreq-test_tsc.c | 4 ++--
>  14 files changed, 22 insertions(+), 28 deletions(-)

Acked-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Michael S. Tsirkin

On Wed, Jun 17, 2015 at 10:54:21AM +0200, Igor Mammedov wrote:
> On Wed, 17 Jun 2015 09:39:06 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > On Wed, Jun 17, 2015 at 09:28:02AM +0200, Igor Mammedov wrote:
> > > On Wed, 17 Jun 2015 08:34:26 +0200
> > > "Michael S. Tsirkin"  wrote:
> > > 
> > > > On Wed, Jun 17, 2015 at 12:00:56AM +0200, Igor Mammedov wrote:
> > > > > On Tue, 16 Jun 2015 23:14:20 +0200
> > > > > "Michael S. Tsirkin"  wrote:
> > > > > 
> > > > > > On Tue, Jun 16, 2015 at 06:33:37PM +0200, Igor Mammedov wrote:
> > > > > > > since commit
> > > > > > >  1d4e7e3 kvm: x86: increase user memory slots to 509
> > > > > > > 
> > > > > > > it became possible to use a bigger amount of memory
> > > > > > > slots, which is used by memory hotplug for
> > > > > > > registering hotplugged memory.
> > > > > > > However QEMU crashes if it's used with more than ~60
> > > > > > > pc-dimm devices and vhost-net since host kernel
> > > > > > > in module vhost-net refuses to accept more than 65
> > > > > > > memory regions.
> > > > > > > 
> > > > > > > Increase VHOST_MEMORY_MAX_NREGIONS from 65 to 509
> > > > > > 
> > > > > > It was 64, not 65.
> > > > > > 
> > > > > > > to match KVM_USER_MEM_SLOTS fixes issue for vhost-net.
> > > > > > > 
> > > > > > > Signed-off-by: Igor Mammedov 
> > > > > > 
> > > > > > Still thinking about this: can you reorder this to
> > > > > > be the last patch in the series please?
> > > > > sure
> > > > > 
> > > > > > 
> > > > > > Also - 509?
> > > > > userspace memory slots in terms of KVM, I made it match
> > > > > KVM's allotment of memory slots for userspace side.
> > > > 
> > > > Maybe KVM has its reasons for this #. I don't see
> > > > why we need to match this exactly.
> > > np, I can cap it at safe 300 slots but it's unlikely that it
> > > would take cut off 1 extra hop since it's capped by QEMU
> > > at 256+[initial fragmented memory]
> > 
> > But what's the point? We allocate 32 bytes per slot.
> > 300*32 = 9600 which is more than 8K, so we are doing
> > an order-3 allocation anyway.
> > If we could cap it at 8K (256 slots) that would make sense
> > since we could avoid wasting vmalloc space.
> 256 is amount of hotpluggable slots  and there is no way
> to predict how initial memory would be fragmented
> (i.e. amount of slots it would take), if we guess wrong
> we are back to square one with crashing userspace.
> So I'd stay consistent with KVM's limit 509 since
> it's only limit, i.e. not actual amount of allocated slots.
> 
> > I'm still not very happy with the whole approach,
> > giving userspace ability allocate 4 whole pages
> > of kernel memory like this.
> I'm working in parallel so that userspace won't take so
> many slots but it won't prevent its current versions
> crashing due to kernel limitation.

Right but at least it's not a regression. If we promise userspace to
support a ton of regions, we can't take it back later, and I'm concerned
about the memory usage.

I think it's already safe to merge the binary lookup patches, and maybe
cache and vmalloc, so that the remaining patch will be small.

>  
> > > > > > I think if we are changing this, it'd be nice to
> > > > > > create a way for userspace to discover the support
> > > > > > and the # of regions supported.
> > > > > That was my first idea before extending KVM's memslots
> > > > > to teach kernel to tell qemu this number so that QEMU
> > > > > at least would be able to check if new memory slot could
> > > > > be added but I was redirected to a more simple solution
> > > > > of just extending vs everdoing things.
> > > > > Currently QEMU supports upto ~250 memslots so 509
> > > > > is about twice high we need it so it should work for near
> > > > > future
> > > > 
> > > > Yes but old kernels are still around. Would be nice if you
> > > > can detect them.
> > > > 
> > > > > but eventually we might still teach kernel and QEMU
> > > > > to make things more robust.
> > > > 
> > > > A new ioctl would be easy to add, I think it's a good
> > > > idea generally.
> > > I can try to do something like this on top of this series.
> > > 
> > > > 
> > > > > > 
> > > > > > 
> > > > > > > ---
> > > > > > >  drivers/vhost/vhost.c | 2 +-
> > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > > > > index 99931a0..6a18c92 100644
> > > > > > > --- a/drivers/vhost/vhost.c
> > > > > > > +++ b/drivers/vhost/vhost.c
> > > > > > > @@ -30,7 +30,7 @@
> > > > > > >  #include "vhost.h"
> > > > > > >  
> > > > > > >  enum {
> > > > > > > - VHOST_MEMORY_MAX_NREGIONS = 64,
> > > > > > > + VHOST_MEMORY_MAX_NREGIONS = 509,
> > > > > > >   VHOST_MEMORY_F_LOG = 0x1,
> > > > > > >  };
> > > > > > >  
> > > > > > > -- 
> > > > > > > 1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] powerpc: add hvcall.h header from Linux

2015-06-17 Thread Will Deacon

On Wed, Jun 17, 2015 at 10:43:50AM +0100, Andre Przywara wrote:
> The powerpc code uses some PAPR hypercalls, of which we need the
> hypercall number. Copy the macro definition parts from the kernel's
> (private) hvcall.h file and remove the extra tricks formerly used
> to be able to include this header file directly.
> 
> Signed-off-by: Andre Przywara 
> ---
> Hi,
> 
> I copied most of the Linux header, without removing
> definitions that kvmtool doesn't use. That should make updates
> easier. If people would prefer a bespoke header, let me know.

I'd rather just #define the stuff we need now that we're outside of the
kernel source tree.

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] powerpc: implement barrier primitives

2015-06-17 Thread Will Deacon

On Wed, Jun 17, 2015 at 10:43:48AM +0100, Andre Przywara wrote:
> Instead of referring to the Linux header including the barrier
> macros, copy over the rather simple implementation for the PowerPC
> barrier instructions kvmtool uses. This fixes build for powerpc.
> 
> Signed-off-by: Andre Przywara 
> ---
> Hi,
> 
> I just took what kvmtool seems to have used before, I actually have
> no idea if "sync" is the right instruction or "lwsync" would do.
> Would be nice if some people with PowerPC knowledge could comment.

I *think* we can use lwsync for rmb and wmb, but would want confirmation
from a ppc guy before making that change!

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Igor Mammedov

On Wed, 17 Jun 2015 12:11:09 +0200
"Michael S. Tsirkin"  wrote:

> On Wed, Jun 17, 2015 at 10:54:21AM +0200, Igor Mammedov wrote:
> > On Wed, 17 Jun 2015 09:39:06 +0200
> > "Michael S. Tsirkin"  wrote:
> > 
> > > On Wed, Jun 17, 2015 at 09:28:02AM +0200, Igor Mammedov wrote:
> > > > On Wed, 17 Jun 2015 08:34:26 +0200
> > > > "Michael S. Tsirkin"  wrote:
> > > > 
> > > > > On Wed, Jun 17, 2015 at 12:00:56AM +0200, Igor Mammedov wrote:
> > > > > > On Tue, 16 Jun 2015 23:14:20 +0200
> > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > 
> > > > > > > On Tue, Jun 16, 2015 at 06:33:37PM +0200, Igor Mammedov wrote:
> > > > > > > > since commit
> > > > > > > >  1d4e7e3 kvm: x86: increase user memory slots to 509
> > > > > > > > 
> > > > > > > > it became possible to use a bigger amount of memory
> > > > > > > > slots, which is used by memory hotplug for
> > > > > > > > registering hotplugged memory.
> > > > > > > > However QEMU crashes if it's used with more than ~60
> > > > > > > > pc-dimm devices and vhost-net since host kernel
> > > > > > > > in module vhost-net refuses to accept more than 65
> > > > > > > > memory regions.
> > > > > > > > 
> > > > > > > > Increase VHOST_MEMORY_MAX_NREGIONS from 65 to 509
> > > > > > > 
> > > > > > > It was 64, not 65.
> > > > > > > 
> > > > > > > > to match KVM_USER_MEM_SLOTS fixes issue for vhost-net.
> > > > > > > > 
> > > > > > > > Signed-off-by: Igor Mammedov 
> > > > > > > 
> > > > > > > Still thinking about this: can you reorder this to
> > > > > > > be the last patch in the series please?
> > > > > > sure
> > > > > > 
> > > > > > > 
> > > > > > > Also - 509?
> > > > > > userspace memory slots in terms of KVM, I made it match
> > > > > > KVM's allotment of memory slots for userspace side.
> > > > > 
> > > > > Maybe KVM has its reasons for this #. I don't see
> > > > > why we need to match this exactly.
> > > > np, I can cap it at safe 300 slots but it's unlikely that it
> > > > would take cut off 1 extra hop since it's capped by QEMU
> > > > at 256+[initial fragmented memory]
> > > 
> > > But what's the point? We allocate 32 bytes per slot.
> > > 300*32 = 9600 which is more than 8K, so we are doing
> > > an order-3 allocation anyway.
> > > If we could cap it at 8K (256 slots) that would make sense
> > > since we could avoid wasting vmalloc space.
> > 256 is amount of hotpluggable slots  and there is no way
> > to predict how initial memory would be fragmented
> > (i.e. amount of slots it would take), if we guess wrong
> > we are back to square one with crashing userspace.
> > So I'd stay consistent with KVM's limit 509 since
> > it's only limit, i.e. not actual amount of allocated slots.
> > 
> > > I'm still not very happy with the whole approach,
> > > giving userspace ability allocate 4 whole pages
> > > of kernel memory like this.
> > I'm working in parallel so that userspace won't take so
> > many slots but it won't prevent its current versions
> > crashing due to kernel limitation.
> 
> Right but at least it's not a regression. If we promise userspace to
> support a ton of regions, we can't take it back later, and I'm concerned
> about the memory usage.
> 
> I think it's already safe to merge the binary lookup patches, and maybe
> cache and vmalloc, so that the remaining patch will be small.
it isn't regression with switching to binary search and increasing
slots to 509 either performance wise it's more on improvment side.
And I was thinking about memory usage as well, that's why I've dropped
faster radix tree in favor of more compact array, can't do better
on kernel side of fix.

Yes we will give userspace to ability to use more slots/and lock up
more memory if it's not able to consolidate memory regions but
that leaves an option for user to run guest with vhost performance
vs crashing it at runtime.

userspace/targets that could consolidate memory regions should
do so and I'm working on that as well but that doesn't mean
that users shouldn't have a choice.
So far it's kernel limitation and this patch fixes crashes
that users see now, with the rest of patches enabling performance
not to regress.

> 
> >  
> > > > > > > I think if we are changing this, it'd be nice to
> > > > > > > create a way for userspace to discover the support
> > > > > > > and the # of regions supported.
> > > > > > That was my first idea before extending KVM's memslots
> > > > > > to teach kernel to tell qemu this number so that QEMU
> > > > > > at least would be able to check if new memory slot could
> > > > > > be added but I was redirected to a more simple solution
> > > > > > of just extending vs everdoing things.
> > > > > > Currently QEMU supports upto ~250 memslots so 509
> > > > > > is about twice high we need it so it should work for near
> > > > > > future
> > > > > 
> > > > > Yes but old kernels are still around. Would be nice if you
> > > > > can detect them.
> > > > > 
> > > > > > but eventually we might still teach kernel and QEMU
> > > > > > to make things more robust.
> > > >

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Michael S. Tsirkin

On Wed, Jun 17, 2015 at 12:37:42PM +0200, Igor Mammedov wrote:
> On Wed, 17 Jun 2015 12:11:09 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > On Wed, Jun 17, 2015 at 10:54:21AM +0200, Igor Mammedov wrote:
> > > On Wed, 17 Jun 2015 09:39:06 +0200
> > > "Michael S. Tsirkin"  wrote:
> > > 
> > > > On Wed, Jun 17, 2015 at 09:28:02AM +0200, Igor Mammedov wrote:
> > > > > On Wed, 17 Jun 2015 08:34:26 +0200
> > > > > "Michael S. Tsirkin"  wrote:
> > > > > 
> > > > > > On Wed, Jun 17, 2015 at 12:00:56AM +0200, Igor Mammedov wrote:
> > > > > > > On Tue, 16 Jun 2015 23:14:20 +0200
> > > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > > 
> > > > > > > > On Tue, Jun 16, 2015 at 06:33:37PM +0200, Igor Mammedov wrote:
> > > > > > > > > since commit
> > > > > > > > >  1d4e7e3 kvm: x86: increase user memory slots to 509
> > > > > > > > > 
> > > > > > > > > it became possible to use a bigger amount of memory
> > > > > > > > > slots, which is used by memory hotplug for
> > > > > > > > > registering hotplugged memory.
> > > > > > > > > However QEMU crashes if it's used with more than ~60
> > > > > > > > > pc-dimm devices and vhost-net since host kernel
> > > > > > > > > in module vhost-net refuses to accept more than 65
> > > > > > > > > memory regions.
> > > > > > > > > 
> > > > > > > > > Increase VHOST_MEMORY_MAX_NREGIONS from 65 to 509
> > > > > > > > 
> > > > > > > > It was 64, not 65.
> > > > > > > > 
> > > > > > > > > to match KVM_USER_MEM_SLOTS fixes issue for vhost-net.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Igor Mammedov 
> > > > > > > > 
> > > > > > > > Still thinking about this: can you reorder this to
> > > > > > > > be the last patch in the series please?
> > > > > > > sure
> > > > > > > 
> > > > > > > > 
> > > > > > > > Also - 509?
> > > > > > > userspace memory slots in terms of KVM, I made it match
> > > > > > > KVM's allotment of memory slots for userspace side.
> > > > > > 
> > > > > > Maybe KVM has its reasons for this #. I don't see
> > > > > > why we need to match this exactly.
> > > > > np, I can cap it at safe 300 slots but it's unlikely that it
> > > > > would take cut off 1 extra hop since it's capped by QEMU
> > > > > at 256+[initial fragmented memory]
> > > > 
> > > > But what's the point? We allocate 32 bytes per slot.
> > > > 300*32 = 9600 which is more than 8K, so we are doing
> > > > an order-3 allocation anyway.
> > > > If we could cap it at 8K (256 slots) that would make sense
> > > > since we could avoid wasting vmalloc space.
> > > 256 is amount of hotpluggable slots  and there is no way
> > > to predict how initial memory would be fragmented
> > > (i.e. amount of slots it would take), if we guess wrong
> > > we are back to square one with crashing userspace.
> > > So I'd stay consistent with KVM's limit 509 since
> > > it's only limit, i.e. not actual amount of allocated slots.
> > > 
> > > > I'm still not very happy with the whole approach,
> > > > giving userspace ability allocate 4 whole pages
> > > > of kernel memory like this.
> > > I'm working in parallel so that userspace won't take so
> > > many slots but it won't prevent its current versions
> > > crashing due to kernel limitation.
> > 
> > Right but at least it's not a regression. If we promise userspace to
> > support a ton of regions, we can't take it back later, and I'm concerned
> > about the memory usage.
> > 
> > I think it's already safe to merge the binary lookup patches, and maybe
> > cache and vmalloc, so that the remaining patch will be small.
> it isn't regression with switching to binary search and increasing
> slots to 509 either performance wise it's more on improvment side.
> And I was thinking about memory usage as well, that's why I've dropped
> faster radix tree in favor of more compact array, can't do better
> on kernel side of fix.
> 
> Yes we will give userspace to ability to use more slots/and lock up
> more memory if it's not able to consolidate memory regions but
> that leaves an option for user to run guest with vhost performance
> vs crashing it at runtime.

Crashing is entirely QEMU's own doing in not handling
the error gracefully.

> 
> userspace/targets that could consolidate memory regions should
> do so and I'm working on that as well but that doesn't mean
> that users shouldn't have a choice.

It's a fairly unusual corner case, I'm not yet
convinced we need to quickly add support to it when just waiting a bit
longer will get us an equivalent (or even more efficient) fix in
userspace.

> So far it's kernel limitation and this patch fixes crashes
> that users see now, with the rest of patches enabling performance
> not to regress.

When I say regression I refer to an option to limit the array
size again after userspace started using the larger size.


> > 
> > >  
> > > > > > > > I think if we are changing this, it'd be nice to
> > > > > > > > create a way for userspace to discover the support
> > > > > > > > and the # of regions supported.
> > > > > > > That was my first idea before exten

Re: [PATCH 1/3] powerpc: implement barrier primitives

2015-06-17 Thread Alexander Graf



On 17.06.15 12:15, Will Deacon wrote:
> On Wed, Jun 17, 2015 at 10:43:48AM +0100, Andre Przywara wrote:
>> Instead of referring to the Linux header including the barrier
>> macros, copy over the rather simple implementation for the PowerPC
>> barrier instructions kvmtool uses. This fixes build for powerpc.
>>
>> Signed-off-by: Andre Przywara 
>> ---
>> Hi,
>>
>> I just took what kvmtool seems to have used before, I actually have
>> no idea if "sync" is the right instruction or "lwsync" would do.
>> Would be nice if some people with PowerPC knowledge could comment.
> 
> I *think* we can use lwsync for rmb and wmb, but would want confirmation
> from a ppc guy before making that change!

Also I'd prefer to play safe for now :)


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: IRQFD support with GICv3 ITS (WAS: RE: [PATCH 00/13] arm64: KVM: GICv3 ITS emulation)

2015-06-17 Thread Andre Przywara

Здравствуй Pavel,

On 06/17/2015 10:21 AM, Pavel Fedin wrote:
>  PING!
>  The discussion has suddenly stopped... What is our status? Is ITS v2 patch 
> being
> developed, or what?

Yes, I am about to get a v2 ready, but mostly with some fixes. If you
want to work on top of it, I can push a WIP branch to my repo.

As Marc mentioned before, this whole irqfd story does not go together
well with KVM and the ITS architecture, so that needs some more
investigation (which I am planning to do in the next days).

Cheers,
Andre.

And do we have some conclusion on irqfd ?
> 
> Kind regards,
> Pavel Fedin
> Expert Engineer
> Samsung Electronics Research center Russia
> 
> 
>> -Original Message-
>> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf 
>> Of Pavel
>> Fedin
>> Sent: Wednesday, June 10, 2015 6:30 PM
>> To: 'Eric Auger'; 'Marc Zyngier'; 'Andre Przywara'; 
>> christoffer.d...@linaro.org
>> Cc: kvm...@lists.cs.columbia.edu; linux-arm-ker...@lists.infradead.org;
> kvm@vger.kernel.org
>> Subject: RE: IRQFD support with GICv3 ITS (WAS: RE: [PATCH 00/13] arm64: 
>> KVM: GICv3 ITS
>> emulation)
>>
>>  Hi!
>>
>>> indeed in newly added qemu kvm-all.c kvm_arch_msi_data_to_gsi we could
>>> call a new ioctl that translates the data + deviceid? into an LPI and
>>> program irqfd with that LPI. This is done once when setting irqfd up.
>>> This also means extending irqfd support to lpi injection, gsi being the
>>> LPI index if gsi >= 8192. in that case we continue using
>>> kvm_gsi_direct_mapping and gsi still is an IRQ index.
>>
>>  This is exactly what i have done in my kernel + qemu. I have added a new 
>> KVM capability
>> and then in qemu i do this:
>> --- cut ---
>> if (kvm_gsi_kernel_mapping()) {
>> struct kvm_msi msi;
>>
>> msi.address_lo = (uint32_t)msg.address;
>> msi.address_hi = msg.address >> 32;
>> msi.data = le32_to_cpu(msg.data);
>> memset(msi.pad, 0, sizeof(msi.pad));
>>
>> if (dev) {
>> msi.devid = (pci_bus_num(dev->bus) << 8) | dev->devfn;
>> msi.flags = KVM_MSI_VALID_DEVID;
>> } else {
>> msi.devid = 0;
>> msi.flags = 0;
>> }
>>
>> return kvm_vm_ioctl(s, KVM_TRANSLATE_MSI, &msi);
>> }
>> --- cut ---
>>  KVM_TRANSLATE_MSI returns an LPI number. This seemed to be the simplest and 
>> fastest
> thing
>> to do.
>>  If someone is interested, i could prepare an RFC patch series for this, 
>> which would
> apply
>> on top of Andre's ITS implementation.
>>
>> Kind regards,
>> Pavel Fedin
>> Expert Engineer
>> Samsung Electronics Research center Russia
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 00/18] x86/tsc: Clean up rdtsc helpers

2015-06-17 Thread Borislav Petkov

On Tue, Jun 16, 2015 at 05:35:48PM -0700, Andy Lutomirski wrote:
> My sincere apologies for the spam.  I send an unholy mixture of the
> real patch set and an old poorly split-up patch set, and the result
> is incomprehensible.  Here's what I meant to send.
> 
> After the some recent threads about rdtsc barriers, I remembered
> that our RDTSC wrappers are a big mess.  Let's clean it up.
> 
> Currently we have rdtscl, rdtscll, native_read_tsc,
> paravirt_read_tsc, and rdtsc_barrier.  For people who haven't
> noticed rdtsc_barrier and who haven't carefully read the docs,
> there's no indication that all of the other accessors have a giant
> ordering gotcha.  The macro forms are ugly, and the paravirt
> implementation is completely pointless.
> 
> rdtscl is particularly awful.  It reads the low bits.  There are no
> performance critical users of just the low bits anywhere in the
> kernel.
> 
> Clean it up.  After this patch set, there are exactly three
> functions.  rdtsc_unordered() is a function that does a raw RDTSC
> and returns a 64-bit number.  rdtsc_ordered() is a function that
> does a properly ordered RDTSC for general-purpose use.
> barrier_before_rdtsc() is exactly what it sounds like.
> 
> Changes from v2:
>  - Rename rdtsc_unordered to just rdtsc
>  - Get rid of rdtsc_barrier entirely instead of renaming it
>  - The KVM patch is new (see above)
>  - Added some acks

peterz reminded me that I'm lazy actually and don't reply to each patch :)

So, I like it, looks good, nice cleanup. It boots on my guest here - I
haven't done any baremetal testing though. Let's give people some more
time to look at it...

Thanks.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 00/10] kvmtool: arm64: GICv3 guest support

2015-06-17 Thread Andre Przywara

Hi,

a new version of the GICv3 support series for kvmtool.

I got rid of passing the number of redistributors around kvmtool.
The new patch 06/10 simplifies ARM's MMIO dispatching, so that we no
longer need to know the GIC size at this point. The FDT code uses
base and size values now directly and these values are private to
arm/gic.c.

The new 07/10 patch aims to solve the number-of-VCPUs problem Marc
mentioned. Instead of letting kvmtool have knowledge about particular
limits, let the kernel decide on this matter. Since KVM_CAP_MAX_VCPUS
is not really reliable on ARM, let's be a bit more relaxed about
KVM_CREATE_VCPU failing and stop with creating more VCPUs if we get
an EINVAL in return.

I also addressed the other comments Marcs gave, but I had to leave
some of the default switch-cases in due to the compiler complaining
otherwise.

Cheers,
Andre.
-

Since Linux 3.19 the kernel can emulate a GICv3 for KVM guests.
This allows more than 8 VCPUs in a guest and enables in-kernel irqchip
for non-backwards-compatible GICv3 implementations.

This series updates kvmtool to support this feature.
The first half of the series is mostly from Marc and supports some
newer features of the virtual GIC which we later depend on. The second
part enables support for a guest GICv3 by adding a new command line
parameter (--irqchip=).

We now use the KVM_CREATE_DEVICE interface to create a virtual GIC
and only fall back to the now legacy KVM_CREATE_IRQCHIP call if the
former is not supported by the kernel.
Also we use two new features the KVM_CREATE_DEVICE interface
introduces:
* We now set the number of actually used interrupts to avoid
  allocating too many of them without ever using them.
* We tell the kernel explicitly that we are finished with the GIC
  initialisation. This is a requirement for future VGIC versions.

The final three patches introduce virtual GICv3 support, so on
supported hardware (and given kernel support) the user can ask KVM to
emulate a GICv3, lifting the 8 VCPU limit of KVM. This is done by
specifying "--irqchip=gicv3" on the command line.
For the time being the kernel only supports a virtual GICv3 on ARM64,
but as the GIC is shared in kvmtool, I had to add the macro
definitions to not break the build on ARM.

This series goes on top of the new official stand-alone repo hosted
on Will's kernel.org git [1].
Find a branch with those patches included at my repo [2].

[1] git://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git
[2] git://linux-arm.org/kvmtool.git (branch gicv3/v3)
http://www.linux-arm.org/git?p=kvmtool.git;a=log;h=refs/heads/gicv3/v3

Andre Przywara (6):
  arm: finish VGIC initialisation explicitly
  arm: simplify MMIO dispatching
  limit number of VCPUs on demand
  arm: prepare for instantiating different IRQ chip devices
  arm: add support for supplying GICv3 redistributor addresses
  arm: use new irqchip parameter to create different vGIC types

Marc Zyngier (4):
  AArch64: Reserve two 64k pages for GIC CPU interface
  AArch{32,64}: use KVM_CREATE_DEVICE & co to instanciate the GIC
  irq: add irq__get_nr_allocated_lines
  AArch{32,64}: dynamically configure the number of GIC interrupts

 arm/aarch32/arm-cpu.c|   2 +-
 arm/aarch64/arm-cpu.c|   2 +-
 arm/aarch64/include/kvm/kvm-arch.h   |   2 +-
 arm/gic.c| 190 +--
 arm/include/arm-common/gic.h |   9 +-
 arm/include/arm-common/kvm-arch.h|  19 ++--
 arm/include/arm-common/kvm-config-arch.h |   9 +-
 arm/include/arm-common/kvm-cpu-arch.h|  14 ++-
 arm/kvm-cpu.c|  27 ++---
 arm/kvm.c|   6 +-
 include/kvm/irq.h|   1 +
 irq.c|   5 +
 kvm-cpu.c|   7 ++
 13 files changed, 242 insertions(+), 51 deletions(-)

-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 09/10] arm: add support for supplying GICv3 redistributor addresses

2015-06-17 Thread Andre Przywara

Instead of the GIC virtual CPU interface an emulated GICv3 needs to
have accesses to its emulated redistributors trapped in the guest.
Add code to tell the kernel about the mapping if a GICv3 emulation was
requested by the user.

This contains some defines which are not (yet) in the (32 bit) header
files to allow compilation for ARM.

Signed-off-by: Andre Przywara 
---
 arm/gic.c | 36 +++-
 arm/include/arm-common/gic.h  |  3 ++-
 arm/include/arm-common/kvm-arch.h |  7 +++
 3 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/arm/gic.c b/arm/gic.c
index b6c5868..efe4b42 100644
--- a/arm/gic.c
+++ b/arm/gic.c
@@ -9,7 +9,18 @@
 #include 
 #include 
 
+/* Those names are not defined for ARM (yet) */
+#ifndef KVM_VGIC_V3_ADDR_TYPE_DIST
+#define KVM_VGIC_V3_ADDR_TYPE_DIST 2
+#endif
+
+#ifndef KVM_VGIC_V3_ADDR_TYPE_REDIST
+#define KVM_VGIC_V3_ADDR_TYPE_REDIST 3
+#endif
+
 static int gic_fd = -1;
+static u64 gic_redists_base;
+static u64 gic_redists_size;
 
 static int gic__create_device(struct kvm *kvm, enum irqchip_type type)
 {
@@ -28,12 +39,21 @@ static int gic__create_device(struct kvm *kvm, enum 
irqchip_type type)
.group  = KVM_DEV_ARM_VGIC_GRP_ADDR,
.addr   = (u64)(unsigned long)&dist_addr,
};
+   struct kvm_device_attr redist_attr = {
+   .group  = KVM_DEV_ARM_VGIC_GRP_ADDR,
+   .attr   = KVM_VGIC_V3_ADDR_TYPE_REDIST,
+   .addr   = (u64)(unsigned long)&gic_redists_base,
+   };
 
switch (type) {
case IRQCHIP_GICV2:
gic_device.type = KVM_DEV_TYPE_ARM_VGIC_V2;
dist_attr.attr  = KVM_VGIC_V2_ADDR_TYPE_DIST;
break;
+   case IRQCHIP_GICV3:
+   gic_device.type = KVM_DEV_TYPE_ARM_VGIC_V3;
+   dist_attr.attr  = KVM_VGIC_V3_ADDR_TYPE_DIST;
+   break;
}
 
err = ioctl(kvm->vm_fd, KVM_CREATE_DEVICE, &gic_device);
@@ -46,6 +66,9 @@ static int gic__create_device(struct kvm *kvm, enum 
irqchip_type type)
case IRQCHIP_GICV2:
err = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, &cpu_if_attr);
break;
+   case IRQCHIP_GICV3:
+   err = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, &redist_attr);
+   break;
}
if (err)
goto out_err;
@@ -97,6 +120,10 @@ int gic__create(struct kvm *kvm, enum irqchip_type type)
switch (type) {
case IRQCHIP_GICV2:
break;
+   case IRQCHIP_GICV3:
+   gic_redists_size = kvm->cfg.nrcpus * ARM_GIC_REDIST_SIZE;
+   gic_redists_base = ARM_GIC_DIST_BASE - gic_redists_size;
+   break;
default:
return -ENODEV;
}
@@ -156,12 +183,19 @@ void gic__generate_fdt_nodes(void *fdt, u32 phandle, enum 
irqchip_type type)
const char *compatible;
u64 reg_prop[] = {
cpu_to_fdt64(ARM_GIC_DIST_BASE), 
cpu_to_fdt64(ARM_GIC_DIST_SIZE),
-   cpu_to_fdt64(ARM_GIC_CPUI_BASE), 
cpu_to_fdt64(ARM_GIC_CPUI_SIZE),
+   0, 0,   /* to be filled */
};
 
switch (type) {
case IRQCHIP_GICV2:
compatible = "arm,cortex-a15-gic";
+   reg_prop[2] = cpu_to_fdt64(ARM_GIC_CPUI_BASE);
+   reg_prop[3] = cpu_to_fdt64(ARM_GIC_CPUI_SIZE);
+   break;
+   case IRQCHIP_GICV3:
+   compatible = "arm,gic-v3";
+   reg_prop[2] = cpu_to_fdt64(gic_redists_base);
+   reg_prop[3] = cpu_to_fdt64(gic_redists_size);
break;
default:
return;
diff --git a/arm/include/arm-common/gic.h b/arm/include/arm-common/gic.h
index 2ed76fa..403d93b 100644
--- a/arm/include/arm-common/gic.h
+++ b/arm/include/arm-common/gic.h
@@ -22,7 +22,8 @@
 #define GIC_MAX_IRQ255
 
 enum irqchip_type {
-   IRQCHIP_GICV2
+   IRQCHIP_GICV2,
+   IRQCHIP_GICV3
 };
 
 struct kvm;
diff --git a/arm/include/arm-common/kvm-arch.h 
b/arm/include/arm-common/kvm-arch.h
index 90d6733..0f5fb7f 100644
--- a/arm/include/arm-common/kvm-arch.h
+++ b/arm/include/arm-common/kvm-arch.h
@@ -30,6 +30,13 @@
 #define KVM_PCI_MMIO_AREA  (KVM_PCI_CFG_AREA + ARM_PCI_CFG_SIZE)
 #define KVM_VIRTIO_MMIO_AREA   ARM_MMIO_AREA
 
+/*
+ * On a GICv3 there must be one redistributor per vCPU.
+ * The value here is the size for one, we multiply this at runtime with
+ * the number of requested vCPUs to get the actual size.
+ */
+#define ARM_GIC_REDIST_SIZE0x2
+
 #define KVM_IRQ_OFFSET GIC_SPI_IRQ_BASE
 
 #define KVM_VM_TYPE0
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 05/10] arm: finish VGIC initialisation explicitly

2015-06-17 Thread Andre Przywara

Since Linux 3.19-rc1 there is a new API to explicitly initialise
the in-kernel GIC emulation by a userland KVM device call.
Use that to tell the kernel we are finished with the GIC
initialisation, since the automatic GIC init will only be provided
as a legacy functionality in the future.

Signed-off-by: Andre Przywara 
Reviewed-by: Marc Zyngier 
---
 arm/gic.c | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/arm/gic.c b/arm/gic.c
index 8560c9b..99f0d2b 100644
--- a/arm/gic.c
+++ b/arm/gic.c
@@ -98,24 +98,43 @@ int gic__create(struct kvm *kvm)
return err;
 }
 
+/*
+ * Sets the number of used interrupts and finalizes the GIC init explicitly.
+ */
 static int gic__init_gic(struct kvm *kvm)
 {
+   int ret;
+
int lines = irq__get_nr_allocated_lines();
u32 nr_irqs = ALIGN(lines, 32) + GIC_SPI_IRQ_BASE;
struct kvm_device_attr nr_irqs_attr = {
.group  = KVM_DEV_ARM_VGIC_GRP_NR_IRQS,
.addr   = (u64)(unsigned long)&nr_irqs,
};
+   struct kvm_device_attr vgic_init_attr = {
+   .group  = KVM_DEV_ARM_VGIC_GRP_CTRL,
+   .attr   = KVM_DEV_ARM_VGIC_CTRL_INIT,
+   };
 
/*
 * If we didn't use the KVM_CREATE_DEVICE method, KVM will
-* give us some default number of interrupts.
+* give us some default number of interrupts. The GIC initialization
+* will be done automatically in this case.
 */
if (gic_fd < 0)
return 0;
 
-   if (!ioctl(gic_fd, KVM_HAS_DEVICE_ATTR, &nr_irqs_attr))
-   return ioctl(gic_fd, KVM_SET_DEVICE_ATTR, &nr_irqs_attr);
+   if (!ioctl(gic_fd, KVM_HAS_DEVICE_ATTR, &nr_irqs_attr)) {
+   ret = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, &nr_irqs_attr);
+   if (ret)
+   return ret;
+   }
+
+   if (!ioctl(gic_fd, KVM_HAS_DEVICE_ATTR, &vgic_init_attr)) {
+   ret = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, &vgic_init_attr);
+   if (ret)
+   return ret;
+   }
 
return 0;
 }
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 01/10] AArch64: Reserve two 64k pages for GIC CPU interface

2015-06-17 Thread Andre Przywara

From: Marc Zyngier 

On AArch64 system with a GICv2, the GICC range can be aligned
to the last 4k block of a 64k page, ending up straddling two
64k pages. In order not to conflict with the distributor mapping,
allocate two 64k pages to the CPU interface.

Signed-off-by: Marc Zyngier 
Signed-off-by: Andre Przywara 
---
 arm/aarch64/include/kvm/kvm-arch.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arm/aarch64/include/kvm/kvm-arch.h 
b/arm/aarch64/include/kvm/kvm-arch.h
index 2f08a26..4925736 100644
--- a/arm/aarch64/include/kvm/kvm-arch.h
+++ b/arm/aarch64/include/kvm/kvm-arch.h
@@ -2,7 +2,7 @@
 #define KVM__KVM_ARCH_H
 
 #define ARM_GIC_DIST_SIZE  0x1
-#define ARM_GIC_CPUI_SIZE  0x1
+#define ARM_GIC_CPUI_SIZE  0x2
 
 #define ARM_KERN_OFFSET(kvm)   ((kvm)->cfg.arch.aarch32_guest  ?   \
0x8000  :   \
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 02/10] AArch{32,64}: use KVM_CREATE_DEVICE & co to instanciate the GIC

2015-06-17 Thread Andre Przywara

From: Marc Zyngier 

As of 3.14, KVM/arm supports the creation/configuration of the GIC through
a more generic device API, which is now the preferred way to do so.

Plumb the new API in, and allow the old code to be used as a fallback.

[Andre: Rename some functions on the way to differentiate between
creation and initialisation more clearly and fix error path.]

Signed-off-by: Marc Zyngier 
Signed-off-by: Andre Przywara 
---
 arm/gic.c| 69 +++-
 arm/include/arm-common/gic.h |  2 +-
 arm/kvm.c|  6 ++--
 3 files changed, 66 insertions(+), 11 deletions(-)

diff --git a/arm/gic.c b/arm/gic.c
index 5d8cbe6..1ff3663 100644
--- a/arm/gic.c
+++ b/arm/gic.c
@@ -7,7 +7,50 @@
 #include 
 #include 
 
-int gic__init_irqchip(struct kvm *kvm)
+static int gic_fd = -1;
+
+static int gic__create_device(struct kvm *kvm)
+{
+   int err;
+   u64 cpu_if_addr = ARM_GIC_CPUI_BASE;
+   u64 dist_addr = ARM_GIC_DIST_BASE;
+   struct kvm_create_device gic_device = {
+   .type   = KVM_DEV_TYPE_ARM_VGIC_V2,
+   };
+   struct kvm_device_attr cpu_if_attr = {
+   .group  = KVM_DEV_ARM_VGIC_GRP_ADDR,
+   .attr   = KVM_VGIC_V2_ADDR_TYPE_CPU,
+   .addr   = (u64)(unsigned long)&cpu_if_addr,
+   };
+   struct kvm_device_attr dist_attr = {
+   .group  = KVM_DEV_ARM_VGIC_GRP_ADDR,
+   .attr   = KVM_VGIC_V2_ADDR_TYPE_DIST,
+   .addr   = (u64)(unsigned long)&dist_addr,
+   };
+
+   err = ioctl(kvm->vm_fd, KVM_CREATE_DEVICE, &gic_device);
+   if (err)
+   return err;
+
+   gic_fd = gic_device.fd;
+
+   err = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, &cpu_if_attr);
+   if (err)
+   goto out_err;
+
+   err = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, &dist_attr);
+   if (err)
+   goto out_err;
+
+   return 0;
+
+out_err:
+   close(gic_fd);
+   gic_fd = -1;
+   return err;
+}
+
+static int gic__create_irqchip(struct kvm *kvm)
 {
int err;
struct kvm_arm_device_addr gic_addr[] = {
@@ -23,12 +66,6 @@ int gic__init_irqchip(struct kvm *kvm)
}
};
 
-   if (kvm->nrcpus > GIC_MAX_CPUS) {
-   pr_warning("%d CPUS greater than maximum of %d -- truncating\n",
-   kvm->nrcpus, GIC_MAX_CPUS);
-   kvm->nrcpus = GIC_MAX_CPUS;
-   }
-
err = ioctl(kvm->vm_fd, KVM_CREATE_IRQCHIP);
if (err)
return err;
@@ -41,6 +78,24 @@ int gic__init_irqchip(struct kvm *kvm)
return err;
 }
 
+int gic__create(struct kvm *kvm)
+{
+   int err;
+
+   if (kvm->nrcpus > GIC_MAX_CPUS) {
+   pr_warning("%d CPUS greater than maximum of %d -- truncating\n",
+   kvm->nrcpus, GIC_MAX_CPUS);
+   kvm->nrcpus = GIC_MAX_CPUS;
+   }
+
+   /* Try the new way first, and fallback on legacy method otherwise */
+   err = gic__create_device(kvm);
+   if (err)
+   err = gic__create_irqchip(kvm);
+
+   return err;
+}
+
 void gic__generate_fdt_nodes(void *fdt, u32 phandle)
 {
u64 reg_prop[] = {
diff --git a/arm/include/arm-common/gic.h b/arm/include/arm-common/gic.h
index 5a36f2c..44859f7 100644
--- a/arm/include/arm-common/gic.h
+++ b/arm/include/arm-common/gic.h
@@ -24,7 +24,7 @@
 struct kvm;
 
 int gic__alloc_irqnum(void);
-int gic__init_irqchip(struct kvm *kvm);
+int gic__create(struct kvm *kvm);
 void gic__generate_fdt_nodes(void *fdt, u32 phandle);
 
 #endif /* ARM_COMMON__GIC_H */
diff --git a/arm/kvm.c b/arm/kvm.c
index 58ad9fa..bcd2533 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -81,7 +81,7 @@ void kvm__arch_init(struct kvm *kvm, const char 
*hugetlbfs_path, u64 ram_size)
madvise(kvm->arch.ram_alloc_start, kvm->arch.ram_alloc_size,
MADV_MERGEABLE | MADV_HUGEPAGE);
 
-   /* Initialise the virtual GIC. */
-   if (gic__init_irqchip(kvm))
-   die("Failed to initialise virtual GIC");
+   /* Create the virtual GIC. */
+   if (gic__create(kvm))
+   die("Failed to create virtual GIC");
 }
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 06/10] arm: simplify MMIO dispatching

2015-06-17 Thread Andre Przywara

Currently we separate any incoming MMIO request into one of the ARM
memory map regions and take care to spare the GIC.
It turns out that this is unnecessary, as we only have one special
region (the IO port area in the first 64 KByte). The MMIO rbtree
takes care about unhandled MMIO ranges, so we can simply drop all the
special range checking (except that for the IO range) in
kvm_cpu__emulate_mmio().
As the GIC is handled in the kernel, a GIC MMIO access should never
reach userland (and we don't know what to do with it anyway).
This lets us delete some more code and simplifies future extensions
(like expanding the GIC regions).
To be in line with the other architectures, move the now simpler
code into a header file.

Signed-off-by: Andre Przywara 
---
 arm/include/arm-common/kvm-arch.h | 12 
 arm/include/arm-common/kvm-cpu-arch.h | 14 --
 arm/kvm-cpu.c | 16 
 3 files changed, 12 insertions(+), 30 deletions(-)

diff --git a/arm/include/arm-common/kvm-arch.h 
b/arm/include/arm-common/kvm-arch.h
index 082131d..90d6733 100644
--- a/arm/include/arm-common/kvm-arch.h
+++ b/arm/include/arm-common/kvm-arch.h
@@ -45,18 +45,6 @@ static inline bool arm_addr_in_ioport_region(u64 phys_addr)
return phys_addr >= KVM_IOPORT_AREA && phys_addr < limit;
 }
 
-static inline bool arm_addr_in_virtio_mmio_region(u64 phys_addr)
-{
-   u64 limit = KVM_VIRTIO_MMIO_AREA + ARM_VIRTIO_MMIO_SIZE;
-   return phys_addr >= KVM_VIRTIO_MMIO_AREA && phys_addr < limit;
-}
-
-static inline bool arm_addr_in_pci_region(u64 phys_addr)
-{
-   u64 limit = KVM_PCI_CFG_AREA + ARM_PCI_CFG_SIZE + ARM_PCI_MMIO_SIZE;
-   return phys_addr >= KVM_PCI_CFG_AREA && phys_addr < limit;
-}
-
 struct kvm_arch {
/*
 * We may have to align the guest memory for virtio, so keep the
diff --git a/arm/include/arm-common/kvm-cpu-arch.h 
b/arm/include/arm-common/kvm-cpu-arch.h
index 36c7872..329979a 100644
--- a/arm/include/arm-common/kvm-cpu-arch.h
+++ b/arm/include/arm-common/kvm-cpu-arch.h
@@ -44,8 +44,18 @@ static inline bool kvm_cpu__emulate_io(struct kvm_cpu *vcpu, 
u16 port, void *dat
return false;
 }
 
-bool kvm_cpu__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data,
-  u32 len, u8 is_write);
+static inline bool kvm_cpu__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr,
+u8 *data, u32 len, u8 is_write)
+{
+   if (arm_addr_in_ioport_region(phys_addr)) {
+   int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN;
+   u16 port = (phys_addr - KVM_IOPORT_AREA) & USHRT_MAX;
+
+   return kvm__emulate_io(vcpu, port, data, direction, len, 1);
+   }
+
+   return kvm__emulate_mmio(vcpu, phys_addr, data, len, is_write);
+}
 
 unsigned long kvm_cpu__get_vcpu_mpidr(struct kvm_cpu *vcpu);
 
diff --git a/arm/kvm-cpu.c b/arm/kvm-cpu.c
index ab08815..7780251 100644
--- a/arm/kvm-cpu.c
+++ b/arm/kvm-cpu.c
@@ -139,22 +139,6 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu)
return false;
 }
 
-bool kvm_cpu__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data,
-  u32 len, u8 is_write)
-{
-   if (arm_addr_in_virtio_mmio_region(phys_addr)) {
-   return kvm__emulate_mmio(vcpu, phys_addr, data, len, is_write);
-   } else if (arm_addr_in_ioport_region(phys_addr)) {
-   int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN;
-   u16 port = (phys_addr - KVM_IOPORT_AREA) & USHRT_MAX;
-   return kvm__emulate_io(vcpu, port, data, direction, len, 1);
-   } else if (arm_addr_in_pci_region(phys_addr)) {
-   return kvm__emulate_mmio(vcpu, phys_addr, data, len, is_write);
-   }
-
-   return false;
-}
-
 void kvm_cpu__show_page_tables(struct kvm_cpu *vcpu)
 {
 }
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 03/10] irq: add irq__get_nr_allocated_lines

2015-06-17 Thread Andre Przywara

From: Marc Zyngier 

The ARM GIC emulation needs to be told the number of interrupts
it has to support. As commit 1c262fa1dc7bc ("kvm tools: irq: make
irq__alloc_line generic") made the interrupt counter private,
add a new accessor returning the number of interrupt lines we've
allocated so far.

Signed-off-by: Marc Zyngier 
Signed-off-by: Andre Przywara 
---
 include/kvm/irq.h | 1 +
 irq.c | 5 +
 2 files changed, 6 insertions(+)

diff --git a/include/kvm/irq.h b/include/kvm/irq.h
index 4cec6f0..8a78e43 100644
--- a/include/kvm/irq.h
+++ b/include/kvm/irq.h
@@ -11,6 +11,7 @@
 struct kvm;
 
 int irq__alloc_line(void);
+int irq__get_nr_allocated_lines(void);
 
 int irq__init(struct kvm *kvm);
 int irq__exit(struct kvm *kvm);
diff --git a/irq.c b/irq.c
index 33ea8d2..71eaa05 100644
--- a/irq.c
+++ b/irq.c
@@ -7,3 +7,8 @@ int irq__alloc_line(void)
 {
return next_line++;
 }
+
+int irq__get_nr_allocated_lines(void)
+{
+   return next_line - KVM_IRQ_OFFSET;
+}
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 10/10] arm: use new irqchip parameter to create different vGIC types

2015-06-17 Thread Andre Przywara

Currently we unconditionally create a virtual GICv2 in the guest.
Add a --irqchip= parameter to let the user specify a different GIC
type for the guest.
For now we the only other supported type is GICv3.

Signed-off-by: Andre Przywara 
---
 arm/aarch64/arm-cpu.c|  2 +-
 arm/gic.c| 17 +
 arm/include/arm-common/kvm-config-arch.h |  9 -
 arm/kvm.c|  2 +-
 4 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/arm/aarch64/arm-cpu.c b/arm/aarch64/arm-cpu.c
index f702b9e..3dc8ea3 100644
--- a/arm/aarch64/arm-cpu.c
+++ b/arm/aarch64/arm-cpu.c
@@ -12,7 +12,7 @@
 static void generate_fdt_nodes(void *fdt, struct kvm *kvm, u32 gic_phandle)
 {
int timer_interrupts[4] = {13, 14, 11, 10};
-   gic__generate_fdt_nodes(fdt, gic_phandle, IRQCHIP_GICV2);
+   gic__generate_fdt_nodes(fdt, gic_phandle, kvm->cfg.arch.irqchip);
timer__generate_fdt_nodes(fdt, kvm, timer_interrupts);
 }
 
diff --git a/arm/gic.c b/arm/gic.c
index efe4b42..5b49416 100644
--- a/arm/gic.c
+++ b/arm/gic.c
@@ -22,6 +22,23 @@ static int gic_fd = -1;
 static u64 gic_redists_base;
 static u64 gic_redists_size;
 
+int irqchip_parser(const struct option *opt, const char *arg, int unset)
+{
+   enum irqchip_type *type = opt->value;
+
+   *type = IRQCHIP_GICV2;
+   if (!strcmp(arg, "gicv2")) {
+   *type = IRQCHIP_GICV2;
+   } else if (!strcmp(arg, "gicv3")) {
+   *type = IRQCHIP_GICV3;
+   } else if (strcmp(arg, "default")) {
+   fprintf(stderr, "irqchip: unknown type \"%s\"\n", arg);
+   return -1;
+   }
+
+   return 0;
+}
+
 static int gic__create_device(struct kvm *kvm, enum irqchip_type type)
 {
int err;
diff --git a/arm/include/arm-common/kvm-config-arch.h 
b/arm/include/arm-common/kvm-config-arch.h
index a8ebd94..9529881 100644
--- a/arm/include/arm-common/kvm-config-arch.h
+++ b/arm/include/arm-common/kvm-config-arch.h
@@ -8,8 +8,11 @@ struct kvm_config_arch {
unsigned intforce_cntfrq;
boolvirtio_trans_pci;
boolaarch32_guest;
+   enum irqchip_type irqchip;
 };
 
+int irqchip_parser(const struct option *opt, const char *arg, int unset);
+
 #define OPT_ARCH_RUN(pfx, cfg) 
\
pfx,
\
ARM_OPT_ARCH_RUN(cfg)   
\
@@ -21,6 +24,10 @@ struct kvm_config_arch {
 "updated to program CNTFRQ correctly*"),   
\
OPT_BOOLEAN('\0', "force-pci", &(cfg)->virtio_trans_pci,
\
"Force virtio devices to use PCI as their default " 
\
-   "transport"),
+   "transport"),   
\
+OPT_CALLBACK('\0', "irqchip", &(cfg)->irqchip, 
\
+"[gicv2|gicv3]",   \
+"type of interrupt controller to emulate in the guest",
\
+irqchip_parser, NULL),
 
 #endif /* ARM_COMMON__KVM_CONFIG_ARCH_H */
diff --git a/arm/kvm.c b/arm/kvm.c
index f9685c2..d0e4a20 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -82,6 +82,6 @@ void kvm__arch_init(struct kvm *kvm, const char 
*hugetlbfs_path, u64 ram_size)
MADV_MERGEABLE | MADV_HUGEPAGE);
 
/* Create the virtual GIC. */
-   if (gic__create(kvm, IRQCHIP_GICV2))
+   if (gic__create(kvm, kvm->cfg.arch.irqchip))
die("Failed to create virtual GIC");
 }
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 07/10] limit number of VCPUs on demand

2015-06-17 Thread Andre Przywara

Currently the ARM GIC checks the number of VCPUs against a fixed
limit, which is GICv2 specific. Don't pretend we know better than the
kernel and let's get rid of that explicit check.
Instead be more relaxed about KVM_CREATE_VCPU failing with EINVAL,
which is the way the kernel communicates having reached a VCPU limit.
If we see this and have at least brought up one VCPU already
successfully, then don't panic, but limit the number of VCPUs instead.

Signed-off-by: Andre Przywara 
---
 arm/gic.c |  6 --
 arm/kvm-cpu.c | 11 +--
 kvm-cpu.c |  7 +++
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arm/gic.c b/arm/gic.c
index 99f0d2b..05f85a2 100644
--- a/arm/gic.c
+++ b/arm/gic.c
@@ -84,12 +84,6 @@ int gic__create(struct kvm *kvm)
 {
int err;
 
-   if (kvm->nrcpus > GIC_MAX_CPUS) {
-   pr_warning("%d CPUS greater than maximum of %d -- truncating\n",
-   kvm->nrcpus, GIC_MAX_CPUS);
-   kvm->nrcpus = GIC_MAX_CPUS;
-   }
-
/* Try the new way first, and fallback on legacy method otherwise */
err = gic__create_device(kvm);
if (err)
diff --git a/arm/kvm-cpu.c b/arm/kvm-cpu.c
index 7780251..c1cf51d 100644
--- a/arm/kvm-cpu.c
+++ b/arm/kvm-cpu.c
@@ -47,12 +47,19 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, 
unsigned long cpu_id)
};
 
vcpu = calloc(1, sizeof(struct kvm_cpu));
-   if (!vcpu)
+   if (!vcpu) {
+   errno = ENOMEM;
return NULL;
+   }
 
vcpu->vcpu_fd = ioctl(kvm->vm_fd, KVM_CREATE_VCPU, cpu_id);
-   if (vcpu->vcpu_fd < 0)
+   if (vcpu->vcpu_fd < 0) {
+   if (errno == EINVAL) {
+   free(vcpu);
+   return NULL;
+   }
die_perror("KVM_CREATE_VCPU ioctl");
+   }
 
mmap_size = ioctl(kvm->sys_fd, KVM_GET_VCPU_MMAP_SIZE, 0);
if (mmap_size < 0)
diff --git a/kvm-cpu.c b/kvm-cpu.c
index 5d90664..7a9d689 100644
--- a/kvm-cpu.c
+++ b/kvm-cpu.c
@@ -222,11 +222,18 @@ int kvm_cpu__init(struct kvm *kvm)
for (i = 0; i < kvm->nrcpus; i++) {
kvm->cpus[i] = kvm_cpu__arch_init(kvm, i);
if (!kvm->cpus[i]) {
+   if (i > 0 && errno == EINVAL)
+   break;
pr_warning("unable to initialize KVM VCPU");
goto fail_alloc;
}
}
 
+   if (i < kvm->nrcpus) {
+   kvm->nrcpus = i;
+   printf("  # The kernel limits the number of CPUs to %d\n", i);
+   }
+
return 0;
 
 fail_alloc:
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 08/10] arm: prepare for instantiating different IRQ chip devices

2015-06-17 Thread Andre Przywara

Extend the vGIC handling code to potentially deal with different IRQ
chip devices instead of hard-coding the GICv2 in.
We extend most vGIC functions to take a type parameter, but still put
GICv2 in at the top for the time being.

Signed-off-by: Andre Przywara 
---
 arm/aarch32/arm-cpu.c|  2 +-
 arm/aarch64/arm-cpu.c|  2 +-
 arm/gic.c| 44 +++-
 arm/include/arm-common/gic.h |  8 ++--
 arm/kvm.c|  2 +-
 5 files changed, 44 insertions(+), 14 deletions(-)

diff --git a/arm/aarch32/arm-cpu.c b/arm/aarch32/arm-cpu.c
index 946e443..d8d6293 100644
--- a/arm/aarch32/arm-cpu.c
+++ b/arm/aarch32/arm-cpu.c
@@ -12,7 +12,7 @@ static void generate_fdt_nodes(void *fdt, struct kvm *kvm, 
u32 gic_phandle)
 {
int timer_interrupts[4] = {13, 14, 11, 10};
 
-   gic__generate_fdt_nodes(fdt, gic_phandle);
+   gic__generate_fdt_nodes(fdt, gic_phandle, IRQCHIP_GICV2);
timer__generate_fdt_nodes(fdt, kvm, timer_interrupts);
 }
 
diff --git a/arm/aarch64/arm-cpu.c b/arm/aarch64/arm-cpu.c
index 8efe877..f702b9e 100644
--- a/arm/aarch64/arm-cpu.c
+++ b/arm/aarch64/arm-cpu.c
@@ -12,7 +12,7 @@
 static void generate_fdt_nodes(void *fdt, struct kvm *kvm, u32 gic_phandle)
 {
int timer_interrupts[4] = {13, 14, 11, 10};
-   gic__generate_fdt_nodes(fdt, gic_phandle);
+   gic__generate_fdt_nodes(fdt, gic_phandle, IRQCHIP_GICV2);
timer__generate_fdt_nodes(fdt, kvm, timer_interrupts);
 }
 
diff --git a/arm/gic.c b/arm/gic.c
index 05f85a2..b6c5868 100644
--- a/arm/gic.c
+++ b/arm/gic.c
@@ -11,13 +11,13 @@
 
 static int gic_fd = -1;
 
-static int gic__create_device(struct kvm *kvm)
+static int gic__create_device(struct kvm *kvm, enum irqchip_type type)
 {
int err;
u64 cpu_if_addr = ARM_GIC_CPUI_BASE;
u64 dist_addr = ARM_GIC_DIST_BASE;
struct kvm_create_device gic_device = {
-   .type   = KVM_DEV_TYPE_ARM_VGIC_V2,
+   .flags  = 0,
};
struct kvm_device_attr cpu_if_attr = {
.group  = KVM_DEV_ARM_VGIC_GRP_ADDR,
@@ -26,17 +26,27 @@ static int gic__create_device(struct kvm *kvm)
};
struct kvm_device_attr dist_attr = {
.group  = KVM_DEV_ARM_VGIC_GRP_ADDR,
-   .attr   = KVM_VGIC_V2_ADDR_TYPE_DIST,
.addr   = (u64)(unsigned long)&dist_addr,
};
 
+   switch (type) {
+   case IRQCHIP_GICV2:
+   gic_device.type = KVM_DEV_TYPE_ARM_VGIC_V2;
+   dist_attr.attr  = KVM_VGIC_V2_ADDR_TYPE_DIST;
+   break;
+   }
+
err = ioctl(kvm->vm_fd, KVM_CREATE_DEVICE, &gic_device);
if (err)
return err;
 
gic_fd = gic_device.fd;
 
-   err = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, &cpu_if_attr);
+   switch (type) {
+   case IRQCHIP_GICV2:
+   err = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, &cpu_if_attr);
+   break;
+   }
if (err)
goto out_err;
 
@@ -80,13 +90,20 @@ static int gic__create_irqchip(struct kvm *kvm)
return err;
 }
 
-int gic__create(struct kvm *kvm)
+int gic__create(struct kvm *kvm, enum irqchip_type type)
 {
int err;
 
+   switch (type) {
+   case IRQCHIP_GICV2:
+   break;
+   default:
+   return -ENODEV;
+   }
+
/* Try the new way first, and fallback on legacy method otherwise */
-   err = gic__create_device(kvm);
-   if (err)
+   err = gic__create_device(kvm, type);
+   if (err && type == IRQCHIP_GICV2)
err = gic__create_irqchip(kvm);
 
return err;
@@ -134,15 +151,24 @@ static int gic__init_gic(struct kvm *kvm)
 }
 late_init(gic__init_gic)
 
-void gic__generate_fdt_nodes(void *fdt, u32 phandle)
+void gic__generate_fdt_nodes(void *fdt, u32 phandle, enum irqchip_type type)
 {
+   const char *compatible;
u64 reg_prop[] = {
cpu_to_fdt64(ARM_GIC_DIST_BASE), 
cpu_to_fdt64(ARM_GIC_DIST_SIZE),
cpu_to_fdt64(ARM_GIC_CPUI_BASE), 
cpu_to_fdt64(ARM_GIC_CPUI_SIZE),
};
 
+   switch (type) {
+   case IRQCHIP_GICV2:
+   compatible = "arm,cortex-a15-gic";
+   break;
+   default:
+   return;
+   }
+
_FDT(fdt_begin_node(fdt, "intc"));
-   _FDT(fdt_property_string(fdt, "compatible", "arm,cortex-a15-gic"));
+   _FDT(fdt_property_string(fdt, "compatible", compatible));
_FDT(fdt_property_cell(fdt, "#interrupt-cells", GIC_FDT_IRQ_NUM_CELLS));
_FDT(fdt_property(fdt, "interrupt-controller", NULL, 0));
_FDT(fdt_property(fdt, "reg", reg_prop, sizeof(reg_prop)));
diff --git a/arm/include/arm-common/gic.h b/arm/include/arm-common/gic.h
index 44859f7..2ed76fa 100644
--- a/arm/include/arm-common/gic.h
+++ b/arm/include/arm-common/gic.h
@@ -21,10 +21,14 @@
 #define GIC_MAX_CPUS   8
 #define GIC_MAX_IRQ

[PATCH v3 04/10] AArch{32,64}: dynamically configure the number of GIC interrupts

2015-06-17 Thread Andre Przywara

From: Marc Zyngier 

In order to reduce the memory usage of large guests (as well
as improve performance), tell KVM about the number of interrupts
we require.

To avoid synchronization with the various device creation,
use a late_init callback to compute the GIC configuration.
[Andre: rename to gic__init_gic() to ease future expansion]

Signed-off-by: Marc Zyngier 
Signed-off-by: Andre Przywara 
---
 arm/gic.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/arm/gic.c b/arm/gic.c
index 1ff3663..8560c9b 100644
--- a/arm/gic.c
+++ b/arm/gic.c
@@ -1,10 +1,12 @@
 #include "kvm/fdt.h"
+#include "kvm/irq.h"
 #include "kvm/kvm.h"
 #include "kvm/virtio.h"
 
 #include "arm-common/gic.h"
 
 #include 
+#include 
 #include 
 
 static int gic_fd = -1;
@@ -96,6 +98,29 @@ int gic__create(struct kvm *kvm)
return err;
 }
 
+static int gic__init_gic(struct kvm *kvm)
+{
+   int lines = irq__get_nr_allocated_lines();
+   u32 nr_irqs = ALIGN(lines, 32) + GIC_SPI_IRQ_BASE;
+   struct kvm_device_attr nr_irqs_attr = {
+   .group  = KVM_DEV_ARM_VGIC_GRP_NR_IRQS,
+   .addr   = (u64)(unsigned long)&nr_irqs,
+   };
+
+   /*
+* If we didn't use the KVM_CREATE_DEVICE method, KVM will
+* give us some default number of interrupts.
+*/
+   if (gic_fd < 0)
+   return 0;
+
+   if (!ioctl(gic_fd, KVM_HAS_DEVICE_ATTR, &nr_irqs_attr))
+   return ioctl(gic_fd, KVM_SET_DEVICE_ATTR, &nr_irqs_attr);
+
+   return 0;
+}
+late_init(gic__init_gic)
+
 void gic__generate_fdt_nodes(void *fdt, u32 phandle)
 {
u64 reg_prop[] = {
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: IRQFD support with GICv3 ITS (WAS: RE: [PATCH 00/13] arm64: KVM: GICv3 ITS emulation)

2015-06-17 Thread Pavel Fedin

 Hello!

> Yes, I am about to get a v2 ready, but mostly with some fixes. If you
> want to work on top of it, I can push a WIP branch to my repo.

 Thank you but no need to hurry up. I am busy with other things too. And, 
anyway, i work on top of my own branch here.

> As Marc mentioned before, this whole irqfd story does not go together
> well with KVM and the ITS architecture, 

 It actually does with GSI routing (see 
http://www.spinics.net/lists/kvm/msg117475.html, approach 2).

> so that needs some more
> investigation (which I am planning to do in the next days).

 Ok.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Igor Mammedov

On Wed, 17 Jun 2015 12:46:09 +0200
"Michael S. Tsirkin"  wrote:

> On Wed, Jun 17, 2015 at 12:37:42PM +0200, Igor Mammedov wrote:
> > On Wed, 17 Jun 2015 12:11:09 +0200
> > "Michael S. Tsirkin"  wrote:
> > 
> > > On Wed, Jun 17, 2015 at 10:54:21AM +0200, Igor Mammedov wrote:
> > > > On Wed, 17 Jun 2015 09:39:06 +0200
> > > > "Michael S. Tsirkin"  wrote:
> > > > 
> > > > > On Wed, Jun 17, 2015 at 09:28:02AM +0200, Igor Mammedov wrote:
> > > > > > On Wed, 17 Jun 2015 08:34:26 +0200
> > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > 
> > > > > > > On Wed, Jun 17, 2015 at 12:00:56AM +0200, Igor Mammedov wrote:
> > > > > > > > On Tue, 16 Jun 2015 23:14:20 +0200
> > > > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > > > 
> > > > > > > > > On Tue, Jun 16, 2015 at 06:33:37PM +0200, Igor Mammedov wrote:
> > > > > > > > > > since commit
> > > > > > > > > >  1d4e7e3 kvm: x86: increase user memory slots to 509
> > > > > > > > > > 
> > > > > > > > > > it became possible to use a bigger amount of memory
> > > > > > > > > > slots, which is used by memory hotplug for
> > > > > > > > > > registering hotplugged memory.
> > > > > > > > > > However QEMU crashes if it's used with more than ~60
> > > > > > > > > > pc-dimm devices and vhost-net since host kernel
> > > > > > > > > > in module vhost-net refuses to accept more than 65
> > > > > > > > > > memory regions.
> > > > > > > > > > 
> > > > > > > > > > Increase VHOST_MEMORY_MAX_NREGIONS from 65 to 509
> > > > > > > > > 
> > > > > > > > > It was 64, not 65.
> > > > > > > > > 
> > > > > > > > > > to match KVM_USER_MEM_SLOTS fixes issue for vhost-net.
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Igor Mammedov 
> > > > > > > > > 
> > > > > > > > > Still thinking about this: can you reorder this to
> > > > > > > > > be the last patch in the series please?
> > > > > > > > sure
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Also - 509?
> > > > > > > > userspace memory slots in terms of KVM, I made it match
> > > > > > > > KVM's allotment of memory slots for userspace side.
> > > > > > > 
> > > > > > > Maybe KVM has its reasons for this #. I don't see
> > > > > > > why we need to match this exactly.
> > > > > > np, I can cap it at safe 300 slots but it's unlikely that it
> > > > > > would take cut off 1 extra hop since it's capped by QEMU
> > > > > > at 256+[initial fragmented memory]
> > > > > 
> > > > > But what's the point? We allocate 32 bytes per slot.
> > > > > 300*32 = 9600 which is more than 8K, so we are doing
> > > > > an order-3 allocation anyway.
> > > > > If we could cap it at 8K (256 slots) that would make sense
> > > > > since we could avoid wasting vmalloc space.
> > > > 256 is amount of hotpluggable slots  and there is no way
> > > > to predict how initial memory would be fragmented
> > > > (i.e. amount of slots it would take), if we guess wrong
> > > > we are back to square one with crashing userspace.
> > > > So I'd stay consistent with KVM's limit 509 since
> > > > it's only limit, i.e. not actual amount of allocated slots.
> > > > 
> > > > > I'm still not very happy with the whole approach,
> > > > > giving userspace ability allocate 4 whole pages
> > > > > of kernel memory like this.
> > > > I'm working in parallel so that userspace won't take so
> > > > many slots but it won't prevent its current versions
> > > > crashing due to kernel limitation.
> > > 
> > > Right but at least it's not a regression. If we promise userspace to
> > > support a ton of regions, we can't take it back later, and I'm concerned
> > > about the memory usage.
> > > 
> > > I think it's already safe to merge the binary lookup patches, and maybe
> > > cache and vmalloc, so that the remaining patch will be small.
> > it isn't regression with switching to binary search and increasing
> > slots to 509 either performance wise it's more on improvment side.
> > And I was thinking about memory usage as well, that's why I've dropped
> > faster radix tree in favor of more compact array, can't do better
> > on kernel side of fix.
> > 
> > Yes we will give userspace to ability to use more slots/and lock up
> > more memory if it's not able to consolidate memory regions but
> > that leaves an option for user to run guest with vhost performance
> > vs crashing it at runtime.
> 
> Crashing is entirely QEMU's own doing in not handling
> the error gracefully.
and that's hard to fix (handle error gracefully) the way it's implemented now.

> > 
> > userspace/targets that could consolidate memory regions should
> > do so and I'm working on that as well but that doesn't mean
> > that users shouldn't have a choice.
> 
> It's a fairly unusual corner case, I'm not yet
> convinced we need to quickly add support to it when just waiting a bit
> longer will get us an equivalent (or even more efficient) fix in
> userspace.
with memory hotplug support in QEMU has been released for quite
a long time already and there is users that use it so fix in
the future QEMU won't make it work with their di

Re: [PATCH 1/2] KVM: fix checkpatch.pl errors in kvm/async_pf.h

2015-06-17 Thread Paolo Bonzini



On 16/06/2015 13:33, Kevin Mulvey wrote:
> fix brace spacing
> 
> Signed-off-by: Kevin Mulvey 
> ---
>  virt/kvm/async_pf.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/async_pf.h b/virt/kvm/async_pf.h
> index e7ef6447..ec4cfa2 100644
> --- a/virt/kvm/async_pf.h
> +++ b/virt/kvm/async_pf.h
> @@ -29,8 +29,8 @@ void kvm_async_pf_deinit(void);
>  void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu);
>  #else
>  #define kvm_async_pf_init() (0)
> -#define kvm_async_pf_deinit() do{}while(0)
> -#define kvm_async_pf_vcpu_init(C) do{}while(0)
> +#define kvm_async_pf_deinit() do {} while (0)
> +#define kvm_async_pf_vcpu_init(C) do {} while (0)
>  #endif
>  
>  #endif
> 

Applied both, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Michael S. Tsirkin

On Wed, Jun 17, 2015 at 01:48:03PM +0200, Igor Mammedov wrote:
> > > So far it's kernel limitation and this patch fixes crashes
> > > that users see now, with the rest of patches enabling performance
> > > not to regress.
> > 
> > When I say regression I refer to an option to limit the array
> > size again after userspace started using the larger size.
> Is there a need to do so?

Considering userspace can be malicious, I guess yes.

> Userspace that cares about memory footprint won't use many slots
> keeping it low and user space that can't do without many slots
> or doesn't care will have bigger memory footprint.

We really can't trust userspace to do the right thing though.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 07/10] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest

2015-06-17 Thread Eric Auger

Hi Marc,
On 06/08/2015 07:04 PM, Marc Zyngier wrote:
> To allow a HW interrupt to be injected into a guest, we lookup the
> guest virtual interrupt in the irq_phys_map rbtree, and if we have
> a match, encode both interrupts in the LR.
> 
> We also mark the interrupt as "active" at the host distributor level.
> 
> On guest EOI on the virtual interrupt, the host interrupt will be
> deactivated.
a "standard" physical IRQ would be first handled by the host handler
which would ack and deactivate it a first time. Here, if my
understanding is correct, the virtual counter PPI never hits. Instead we
"emulate" it on world-switch by directly setting the dist state. Is that
correct? If yes it is quite a specific handling of an "HW" IRQ.

> 
> Signed-off-by: Marc Zyngier 
> ---
>  virt/kvm/arm/vgic.c | 71 
> ++---
>  1 file changed, 68 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index c6604f2..495ac7d 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1120,6 +1120,26 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu 
> *vcpu, int irq,
>   if (!vgic_irq_is_edge(vcpu, irq))
>   vlr.state |= LR_EOI_INT;
>  
> + if (vlr.irq >= VGIC_NR_SGIS) {
> + struct irq_phys_map *map;
> + map = vgic_irq_map_search(vcpu, irq);
> +
> + if (map) {
> + int ret;
> +
> + BUG_ON(!map->active);
> + vlr.hwirq = map->phys_irq;
> + vlr.state |= LR_HW;
> + vlr.state &= ~LR_EOI_INT;
> +
> + ret = irq_set_irqchip_state(map->irq,
> + IRQCHIP_STATE_ACTIVE,
> + true);
> + vgic_irq_set_queued(vcpu, irq);
queued state was used for level sensitive IRQs only. Forwarded or "HW"
IRQs theoretically can be edge or sensitive, right? If yes may be worth
to justify the usage of queued state for forwarded IRQ? Also
vgic_irq_set_queued rather was called in parent vgic_queue_hwirq today.

> + WARN_ON(ret);
> + }
> + }
> +
>   vgic_set_lr(vcpu, lr_nr, vlr);
>   vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
>  }
> @@ -1344,6 +1364,35 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
> *vcpu)
>   return level_pending;
>  }
>  
> +/* Return 1 if HW interrupt went from active to inactive, and 0 otherwise */
> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> +{
> + struct irq_phys_map *map;
> + int ret;
> +
> + if (!(vlr.state & LR_HW))
> + return 0;
> +
> + map = vgic_irq_map_search(vcpu, vlr.irq);
> + BUG_ON(!map || !map->active);
> +
> + ret = irq_get_irqchip_state(map->irq,
> + IRQCHIP_STATE_ACTIVE,
> + &map->active);
Doesn't it work because the virtual timer was disabled during the world
switch. Does it characterize all "shared" devices? Difficult for me to
understand how much this is specific to arch timer integration?
> +
> + WARN_ON(ret);
> +
> + if (map->active) {
> + ret = irq_set_irqchip_state(map->irq,
> + IRQCHIP_STATE_ACTIVE,
> + false);
> + WARN_ON(ret);
> + return 0;
> + }
> +
> + return 1;
> +}
> +
>  /* Sync back the VGIC state after a guest run */
>  static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>  {
> @@ -1358,14 +1407,30 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu 
> *vcpu)
>   elrsr = vgic_get_elrsr(vcpu);
>   elrsr_ptr = u64_to_bitmask(&elrsr);
>  
> - /* Clear mappings for empty LRs */
> - for_each_set_bit(lr, elrsr_ptr, vgic->nr_lr) {
> + /* Deal with HW interrupts, and clear mappings for empty LRs */
> + for (lr = 0; lr < vgic->nr_lr; lr++) {
>   struct vgic_lr vlr;
>  
> - if (!test_and_clear_bit(lr, vgic_cpu->lr_used))
> + if (!test_bit(lr, vgic_cpu->lr_used))
>   continue;
>  
>   vlr = vgic_get_lr(vcpu, lr);
> + if (vgic_sync_hwirq(vcpu, vlr)) {
> + /*
> +  * So this is a HW interrupt that the guest
> +  * EOI-ed. Clean the LR state and allow the
> +  * interrupt to be queued again.
> +  */
> + vlr.state &= ~LR_HW;
> + vlr.hwirq = 0;
> + vgic_set_lr(vcpu, lr, vlr);
> + vgic_irq_clear_queued(vcpu, vlr.irq)
not necessarily a level sensitive IRQ?

- Eric
> + }
> +
> + if (!test_bit(lr, elrsr_ptr))
> + continue;
> +
> + clear_bit(lr, vgic_cpu->lr_used);
>  
>   BUG_ON(vlr.irq >=

Re: [PATCH 04/10] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR

2015-06-17 Thread Eric Auger

On 06/08/2015 07:03 PM, Marc Zyngier wrote:
> Now that struct vgic_lr supports the LR_HW bit and carries a hwirq
> field, we can encode that information into the list registers.
> 
> This patch provides implementations for both GICv2 and GICv3.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  include/linux/irqchip/arm-gic-v3.h |  3 +++
>  include/linux/irqchip/arm-gic.h|  3 ++-
>  virt/kvm/arm/vgic-v2.c | 16 +++-
>  virt/kvm/arm/vgic-v3.c | 21 ++---
>  4 files changed, 38 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/irqchip/arm-gic-v3.h 
> b/include/linux/irqchip/arm-gic-v3.h
> index ffbc034..cf637d6 100644
> --- a/include/linux/irqchip/arm-gic-v3.h
> +++ b/include/linux/irqchip/arm-gic-v3.h
> @@ -268,9 +268,12 @@
>  
>  #define ICH_LR_EOI   (1UL << 41)
>  #define ICH_LR_GROUP (1UL << 60)
> +#define ICH_LR_HW(1UL << 61)
>  #define ICH_LR_STATE (3UL << 62)
>  #define ICH_LR_PENDING_BIT   (1UL << 62)
>  #define ICH_LR_ACTIVE_BIT(1UL << 63)
> +#define ICH_LR_PHYS_ID_SHIFT 32
> +#define ICH_LR_PHYS_ID_MASK  (0x3ffUL << ICH_LR_PHYS_ID_SHIFT)
>  
>  #define ICH_MISR_EOI (1 << 0)
>  #define ICH_MISR_U   (1 << 1)
> diff --git a/include/linux/irqchip/arm-gic.h b/include/linux/irqchip/arm-gic.h
> index 9de976b..ca88dad 100644
> --- a/include/linux/irqchip/arm-gic.h
> +++ b/include/linux/irqchip/arm-gic.h
> @@ -71,11 +71,12 @@
>  
>  #define GICH_LR_VIRTUALID(0x3ff << 0)
>  #define GICH_LR_PHYSID_CPUID_SHIFT   (10)
> -#define GICH_LR_PHYSID_CPUID (7 << GICH_LR_PHYSID_CPUID_SHIFT)
> +#define GICH_LR_PHYSID_CPUID (0x3ff << GICH_LR_PHYSID_CPUID_SHIFT)
>  #define GICH_LR_STATE(3 << 28)
>  #define GICH_LR_PENDING_BIT  (1 << 28)
>  #define GICH_LR_ACTIVE_BIT   (1 << 29)
>  #define GICH_LR_EOI  (1 << 19)
> +#define GICH_LR_HW   (1 << 31)
>  
>  #define GICH_VMCR_CTRL_SHIFT 0
>  #define GICH_VMCR_CTRL_MASK  (0x21f << GICH_VMCR_CTRL_SHIFT)
> diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
> index f9b9c7c..8d7b04d 100644
> --- a/virt/kvm/arm/vgic-v2.c
> +++ b/virt/kvm/arm/vgic-v2.c
> @@ -48,6 +48,10 @@ static struct vgic_lr vgic_v2_get_lr(const struct kvm_vcpu 
> *vcpu, int lr)
>   lr_desc.state |= LR_STATE_ACTIVE;
>   if (val & GICH_LR_EOI)
>   lr_desc.state |= LR_EOI_INT;
> + if (val & GICH_LR_HW) {
> + lr_desc.state |= LR_HW;
> + lr_desc.hwirq = (val & GICH_LR_PHYSID_CPUID) >> 
> GICH_LR_PHYSID_CPUID_SHIFT;
> + }
>  
>   return lr_desc;
>  }
> @@ -55,7 +59,9 @@ static struct vgic_lr vgic_v2_get_lr(const struct kvm_vcpu 
> *vcpu, int lr)
>  static void vgic_v2_set_lr(struct kvm_vcpu *vcpu, int lr,
>  struct vgic_lr lr_desc)
>  {
> - u32 lr_val = (lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT) | 
> lr_desc.irq;
> + u32 lr_val;
> +
> + lr_val = lr_desc.irq;
>  
>   if (lr_desc.state & LR_STATE_PENDING)
>   lr_val |= GICH_LR_PENDING_BIT;
> @@ -64,6 +70,14 @@ static void vgic_v2_set_lr(struct kvm_vcpu *vcpu, int lr,
>   if (lr_desc.state & LR_EOI_INT)
>   lr_val |= GICH_LR_EOI;
>  
> + if (lr_desc.state & LR_HW) {
> + lr_val |= GICH_LR_HW;
> + lr_val |= (u32)lr_desc.hwirq << GICH_LR_PHYSID_CPUID_SHIFT;
shouldn't we test somewhere that the hwirq is between 16 and 1019. Else
behavior is unpredictable according to v2 spec. when queuing into the LR
we currently check the linux irq vlr.irq >= VGIC_NR_SGIS if I am not wrong.

besides Reviewed-by: Eric Auger 

Eric
> + }
> +
> + if (lr_desc.irq < VGIC_NR_SGIS)
> + lr_val |= (lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT);
> +
>   vcpu->arch.vgic_cpu.vgic_v2.vgic_lr[lr] = lr_val;
>  }
>  
> diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
> index dff0602..afbf925 100644
> --- a/virt/kvm/arm/vgic-v3.c
> +++ b/virt/kvm/arm/vgic-v3.c
> @@ -67,6 +67,10 @@ static struct vgic_lr vgic_v3_get_lr(const struct kvm_vcpu 
> *vcpu, int lr)
>   lr_desc.state |= LR_STATE_ACTIVE;
>   if (val & ICH_LR_EOI)
>   lr_desc.state |= LR_EOI_INT;
> + if (val & ICH_LR_HW) {
> + lr_desc.state |= LR_HW;
> + lr_desc.hwirq = (val >> ICH_LR_PHYS_ID_SHIFT) & GENMASK(9, 0);
> + }
>  
>   return lr_desc;
>  }
> @@ -84,10 +88,17 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr,
>* Eventually we want to make this configurable, so we may revisit
>* this in the future.
>*/
> - if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
> + switch (vcpu->kvm->arch.vgic.vgic_model) {
> + case KVM_DEV_TYPE_ARM_VGIC_V3:
>   lr_val |= ICH_LR_GROUP;
not related to that patch but why LR_GR

Re: [PATCH 07/10] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest

2015-06-17 Thread Marc Zyngier

Hi Eric,

On 17/06/15 12:51, Eric Auger wrote:
> Hi Marc,
> On 06/08/2015 07:04 PM, Marc Zyngier wrote:
>> To allow a HW interrupt to be injected into a guest, we lookup the
>> guest virtual interrupt in the irq_phys_map rbtree, and if we have
>> a match, encode both interrupts in the LR.
>>
>> We also mark the interrupt as "active" at the host distributor level.
>>
>> On guest EOI on the virtual interrupt, the host interrupt will be
>> deactivated.
>
> a "standard" physical IRQ would be first handled by the host handler
> which would ack and deactivate it a first time. Here, if my
> understanding is correct, the virtual counter PPI never hits. Instead we
> "emulate" it on world-switch by directly setting the dist state. Is that
> correct? If yes it is quite a specific handling of an "HW" IRQ.

This is (mostly) correct. Because we deal with HW that is shared between
guests, we absolutely need to make that HW quiescent before getting back
to the host. Setting the active bit in the distributor allows us to
restore the HW in a state that shows a pending interrupt at the guest
level, but ensure that the interrupt doesn't fire at the host level.

As for the "specificity", this is how the architecture has been
designed, and the way we're expected to deal with this kind of shared
HW. Rest assured I didn't come up with that on my own! ;-)

> 
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  virt/kvm/arm/vgic.c | 71 
>> ++---
>>  1 file changed, 68 insertions(+), 3 deletions(-)
>>
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index c6604f2..495ac7d 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -1120,6 +1120,26 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu 
>> *vcpu, int irq,
>>  if (!vgic_irq_is_edge(vcpu, irq))
>>  vlr.state |= LR_EOI_INT;
>>  
>> +if (vlr.irq >= VGIC_NR_SGIS) {
>> +struct irq_phys_map *map;
>> +map = vgic_irq_map_search(vcpu, irq);
>> +
>> +if (map) {
>> +int ret;
>> +
>> +BUG_ON(!map->active);
>> +vlr.hwirq = map->phys_irq;
>> +vlr.state |= LR_HW;
>> +vlr.state &= ~LR_EOI_INT;
>> +
>> +ret = irq_set_irqchip_state(map->irq,
>> +IRQCHIP_STATE_ACTIVE,
>> +true);
>> +vgic_irq_set_queued(vcpu, irq);
>
> queued state was used for level sensitive IRQs only. Forwarded or "HW"
> IRQs theoretically can be edge or sensitive, right? If yes may be worth
> to justify the usage of queued state for forwarded IRQ? Also

That's because it is illegal to set a HW interrupt to be PENDING+ACTIVE,
which means we have to prevent the interrupt to be injected multiple
times. The behaviour is sufficiently close to what we do for a level
interrupt that we use the same state.

> vgic_irq_set_queued rather was called in parent vgic_queue_hwirq today.

I tried to keep the HW bit madness as localized as possible. Letting it
spread further away seems to make the code more difficult to read IMHO.

> 
>> +WARN_ON(ret);
>> +}
>> +}
>> +
>>  vgic_set_lr(vcpu, lr_nr, vlr);
>>  vgic_sync_lr_elrsr(vcpu, lr_nr, vlr);
>>  }
>> @@ -1344,6 +1364,35 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
>> *vcpu)
>>  return level_pending;
>>  }
>>  
>> +/* Return 1 if HW interrupt went from active to inactive, and 0 otherwise */
>> +static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>> +{
>> +struct irq_phys_map *map;
>> +int ret;
>> +
>> +if (!(vlr.state & LR_HW))
>> +return 0;
>> +
>> +map = vgic_irq_map_search(vcpu, vlr.irq);
>> +BUG_ON(!map || !map->active);
>> +
>> +ret = irq_get_irqchip_state(map->irq,
>> +IRQCHIP_STATE_ACTIVE,
>> +&map->active);
>
> Doesn't it work because the virtual timer was disabled during the world
> switch. Does it characterize all "shared" devices? Difficult for me to
> understand how much this is specific to arch timer integration?

Shared devices cannot be left running when the guest is not running
because (a) we have lost the context (the guest), and (b) we need to
give it to another guest. This is a fundamental property of this kind of
resource.

This is by no mean specific to the timer, BTW. The VGIC itself is a
shared resource, and we nuke it on each exit, for the same reason. The
only difference is that we don't propagate the VGIC interrupt to a guest.

>> +
>> +WARN_ON(ret);
>> +
>> +if (map->active) {
>> +ret = irq_set_irqchip_state(map->irq,
>> +IRQCHIP_STATE_ACTIVE,
>> +false);
>> +WARN_ON(ret);
>> +return 0;
>> +}
>> +
>> +ret

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Igor Mammedov

On Wed, 17 Jun 2015 13:51:56 +0200
"Michael S. Tsirkin"  wrote:

> On Wed, Jun 17, 2015 at 01:48:03PM +0200, Igor Mammedov wrote:
> > > > So far it's kernel limitation and this patch fixes crashes
> > > > that users see now, with the rest of patches enabling performance
> > > > not to regress.
> > > 
> > > When I say regression I refer to an option to limit the array
> > > size again after userspace started using the larger size.
> > Is there a need to do so?
> 
> Considering userspace can be malicious, I guess yes.
I don't think it's a valid concern in this case,
setting limit back from 509 to 64 will not help here in any way,
userspace still can create as many vhost instances as it needs
to consume memory it desires.

> 
> > Userspace that cares about memory footprint won't use many slots
> > keeping it low and user space that can't do without many slots
> > or doesn't care will have bigger memory footprint.
> 
> We really can't trust userspace to do the right thing though.
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: nSVM: Check for NRIPS support before updating control field

2015-06-17 Thread Paolo Bonzini



On 11/06/2015 08:05, Bandan Das wrote:
> 
> If hardware doesn't support DecodeAssist - a feature that provides
> more information about the intercept in the VMCB, KVM decodes the
> instruction and then updates the next_rip vmcb control field.
> However, NRIP support itself depends on cpuid Fn8000_000A_EDX[NRIPS].
> Since skip_emulated_instruction() doesn't verify nrip support
> before accepting control.next_rip as valid, avoid writing this
> field if support isn't present.
> 
> Signed-off-by: Bandan Das 
> ---
>  arch/x86/kvm/svm.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 9afa233..4911bf1 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -511,8 +511,10 @@ static void skip_emulated_instruction(struct kvm_vcpu 
> *vcpu)
>  {
>   struct vcpu_svm *svm = to_svm(vcpu);
>  
> - if (svm->vmcb->control.next_rip != 0)
> + if (svm->vmcb->control.next_rip != 0) {
> + WARN_ON(!static_cpu_has(X86_FEATURE_NRIPS));
>   svm->next_rip = svm->vmcb->control.next_rip;
> + }
>  
>   if (!svm->next_rip) {
>   if (emulate_instruction(vcpu, EMULTYPE_SKIP) !=
> @@ -4317,7 +4319,9 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu,
>   break;
>   }
>  
> - vmcb->control.next_rip  = info->next_rip;
> + /* TODO: Advertise NRIPS to guest hypervisor unconditionally */
> + if (static_cpu_has(X86_FEATURE_NRIPS))
> + vmcb->control.next_rip  = info->next_rip;
>   vmcb->control.exit_code = icpt_info.exit_code;
>   vmexit = nested_svm_exit_handled(svm);
>  
> 

Applied, thanks.

paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 04/10] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR

2015-06-17 Thread Marc Zyngier

On 17/06/15 12:53, Eric Auger wrote:
> On 06/08/2015 07:03 PM, Marc Zyngier wrote:
>> Now that struct vgic_lr supports the LR_HW bit and carries a hwirq
>> field, we can encode that information into the list registers.
>>
>> This patch provides implementations for both GICv2 and GICv3.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  include/linux/irqchip/arm-gic-v3.h |  3 +++
>>  include/linux/irqchip/arm-gic.h|  3 ++-
>>  virt/kvm/arm/vgic-v2.c | 16 +++-
>>  virt/kvm/arm/vgic-v3.c | 21 ++---
>>  4 files changed, 38 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/irqchip/arm-gic-v3.h 
>> b/include/linux/irqchip/arm-gic-v3.h
>> index ffbc034..cf637d6 100644
>> --- a/include/linux/irqchip/arm-gic-v3.h
>> +++ b/include/linux/irqchip/arm-gic-v3.h
>> @@ -268,9 +268,12 @@
>>  
>>  #define ICH_LR_EOI  (1UL << 41)
>>  #define ICH_LR_GROUP(1UL << 60)
>> +#define ICH_LR_HW   (1UL << 61)
>>  #define ICH_LR_STATE(3UL << 62)
>>  #define ICH_LR_PENDING_BIT  (1UL << 62)
>>  #define ICH_LR_ACTIVE_BIT   (1UL << 63)
>> +#define ICH_LR_PHYS_ID_SHIFT32
>> +#define ICH_LR_PHYS_ID_MASK (0x3ffUL << ICH_LR_PHYS_ID_SHIFT)
>>  
>>  #define ICH_MISR_EOI(1 << 0)
>>  #define ICH_MISR_U  (1 << 1)
>> diff --git a/include/linux/irqchip/arm-gic.h 
>> b/include/linux/irqchip/arm-gic.h
>> index 9de976b..ca88dad 100644
>> --- a/include/linux/irqchip/arm-gic.h
>> +++ b/include/linux/irqchip/arm-gic.h
>> @@ -71,11 +71,12 @@
>>  
>>  #define GICH_LR_VIRTUALID   (0x3ff << 0)
>>  #define GICH_LR_PHYSID_CPUID_SHIFT  (10)
>> -#define GICH_LR_PHYSID_CPUID(7 << 
>> GICH_LR_PHYSID_CPUID_SHIFT)
>> +#define GICH_LR_PHYSID_CPUID(0x3ff << 
>> GICH_LR_PHYSID_CPUID_SHIFT)
>>  #define GICH_LR_STATE   (3 << 28)
>>  #define GICH_LR_PENDING_BIT (1 << 28)
>>  #define GICH_LR_ACTIVE_BIT  (1 << 29)
>>  #define GICH_LR_EOI (1 << 19)
>> +#define GICH_LR_HW  (1 << 31)
>>  
>>  #define GICH_VMCR_CTRL_SHIFT0
>>  #define GICH_VMCR_CTRL_MASK (0x21f << GICH_VMCR_CTRL_SHIFT)
>> diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
>> index f9b9c7c..8d7b04d 100644
>> --- a/virt/kvm/arm/vgic-v2.c
>> +++ b/virt/kvm/arm/vgic-v2.c
>> @@ -48,6 +48,10 @@ static struct vgic_lr vgic_v2_get_lr(const struct 
>> kvm_vcpu *vcpu, int lr)
>>  lr_desc.state |= LR_STATE_ACTIVE;
>>  if (val & GICH_LR_EOI)
>>  lr_desc.state |= LR_EOI_INT;
>> +if (val & GICH_LR_HW) {
>> +lr_desc.state |= LR_HW;
>> +lr_desc.hwirq = (val & GICH_LR_PHYSID_CPUID) >> 
>> GICH_LR_PHYSID_CPUID_SHIFT;
>> +}
>>  
>>  return lr_desc;
>>  }
>> @@ -55,7 +59,9 @@ static struct vgic_lr vgic_v2_get_lr(const struct kvm_vcpu 
>> *vcpu, int lr)
>>  static void vgic_v2_set_lr(struct kvm_vcpu *vcpu, int lr,
>> struct vgic_lr lr_desc)
>>  {
>> -u32 lr_val = (lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT) | 
>> lr_desc.irq;
>> +u32 lr_val;
>> +
>> +lr_val = lr_desc.irq;
>>  
>>  if (lr_desc.state & LR_STATE_PENDING)
>>  lr_val |= GICH_LR_PENDING_BIT;
>> @@ -64,6 +70,14 @@ static void vgic_v2_set_lr(struct kvm_vcpu *vcpu, int lr,
>>  if (lr_desc.state & LR_EOI_INT)
>>  lr_val |= GICH_LR_EOI;
>>  
>> +if (lr_desc.state & LR_HW) {
>> +lr_val |= GICH_LR_HW;
>> +lr_val |= (u32)lr_desc.hwirq << GICH_LR_PHYSID_CPUID_SHIFT;
>
> shouldn't we test somewhere that the hwirq is between 16 and 1019. Else
> behavior is unpredictable according to v2 spec. when queuing into the LR
> we currently check the linux irq vlr.irq >= VGIC_NR_SGIS if I am not wrong.

This is actually implicit. vgic_map_phys_irq() takes a parameter (irq)
that is the Linux view of the hwirq we're dealing with (we fetch this
hwirq by traversing the irq_data list associated with irq).

SGIs are not part of the set of interrupts that can be mapped to a Linux
irq (their usage is completely private to the two GIC drivers).

Note that GICv3 allows SGIs to be set as a physical interrupt in an LR
though, but this is not a feature we use so far.

> besides Reviewed-by: Eric Auger 

Thanks!

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] kvm/x86: Hyper-V based guest crash data handling

2015-06-17 Thread Paolo Bonzini



On 11/06/2015 15:18, Denis V. Lunev wrote:
> From: Andrey Smetanin 
> 
> Windows 2012 guests can notify hypervisor about occurred guest crash
> (Windows bugcheck(BSOD)) by writing specific Hyper-V msrs. This patch does
> handling of this MSR's by KVM and sending notification to user space that
> allows to gather Windows guest crash dump by QEMU/LIBVIRT.
> 
> The idea is to provide functionality equal to pvpanic device without
> QEMU guest agent for Windows.
> 
> The idea is borrowed from Linux HyperV bus driver and validated against
> Windows 2k12.
> 
> Signed-off-by: Andrey Smetanin 
> Signed-off-by: Denis V. Lunev 
> CC: Gleb Natapov 
> CC: Paolo Bonzini 
> ---
>  arch/x86/include/uapi/asm/hyperv.h | 10 +
>  arch/x86/kvm/Makefile  |  2 +-
>  arch/x86/kvm/mshv.c| 84 
> ++
>  arch/x86/kvm/mshv.h| 32 +++

Please use hyperv.[ch] or hyper-v.[ch] and name the functions kvm_hv_*.
 We can later move more functions from x86.c to the new file, so it's
better to keep the names consistent.

>  arch/x86/kvm/x86.c | 25 
>  include/linux/kvm_host.h   | 17 
>  include/uapi/linux/kvm.h   | 11 +
>  7 files changed, 180 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/kvm/mshv.c
>  create mode 100644 arch/x86/kvm/mshv.h
> 
> diff --git a/arch/x86/include/uapi/asm/hyperv.h 
> b/arch/x86/include/uapi/asm/hyperv.h
> index ce6068d..25f3064 100644
> --- a/arch/x86/include/uapi/asm/hyperv.h
> +++ b/arch/x86/include/uapi/asm/hyperv.h
> @@ -199,6 +199,16 @@
>  #define HV_X64_MSR_STIMER3_CONFIG0x40B6
>  #define HV_X64_MSR_STIMER3_COUNT 0x40B7
>  
> +
> +/* Hypev-V guest crash notification MSR's */
> +#define HV_X64_MSR_CRASH_P0  0x4100
> +#define HV_X64_MSR_CRASH_P1  0x4101
> +#define HV_X64_MSR_CRASH_P2  0x4102
> +#define HV_X64_MSR_CRASH_P3  0x4103
> +#define HV_X64_MSR_CRASH_P4  0x4104
> +#define HV_X64_MSR_CRASH_CTL 0x4105
> +#define HV_CRASH_CTL_CRASH_NOTIFY(1ULL << 63)
> +
>  #define HV_X64_MSR_HYPERCALL_ENABLE  0x0001
>  #define HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT  12
>  #define HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_MASK   \
> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> index 16e8f96..b1ec24d 100644
> --- a/arch/x86/kvm/Makefile
> +++ b/arch/x86/kvm/Makefile
> @@ -12,7 +12,7 @@ kvm-y   += $(KVM)/kvm_main.o 
> $(KVM)/coalesced_mmio.o \
>  kvm-$(CONFIG_KVM_ASYNC_PF)   += $(KVM)/async_pf.o
>  
>  kvm-y+= x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
> -i8254.o ioapic.o irq_comm.o cpuid.o pmu.o
> +i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mshv.o
>  kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)  += assigned-dev.o iommu.o
>  kvm-intel-y  += vmx.o
>  kvm-amd-y+= svm.o
> diff --git a/arch/x86/kvm/mshv.c b/arch/x86/kvm/mshv.c
> new file mode 100644
> index 000..ad367c44
> --- /dev/null
> +++ b/arch/x86/kvm/mshv.c
> @@ -0,0 +1,84 @@
> +/*
> + * KVM Microsoft Hyper-V extended paravirtualization
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + * Copyright (C) 2015 Andrey Smetanin 
> + *
> + * Authors: Andrey Smetanin asmeta...@virtuozzo.com
> + */
> +
> +#include 
> +#include "mshv.h"
> +
> +int kvm_mshv_ctx_create(struct kvm *kvm)
> +{
> + struct kvm_mshv_ctx *ctx;
> +
> + ctx = kzalloc(sizeof(struct kvm_mshv_ctx), GFP_KERNEL);
> + if (!ctx)
> + return -ENOMEM;
> +
> + ctx->kvm = kvm;
> + atomic_set(&ctx->crash_pending, 0);
> + kvm->mshv_ctx = ctx;
> + return 0;
> +}
> +
> +void kvm_mshv_ctx_destroy(struct kvm *kvm)
> +{
> + kfree(kvm->mshv_ctx);
> +}
> +
> +int kvm_mshv_msr_get_crash_ctl(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
> +{
> + struct kvm_mshv_ctx *ctx = kvm_vcpu_get_mshv_ctx(vcpu);
> +
> + atomic_set(&ctx->crash_pending, 1);
> +
> + /* Response that KVM ready to receive crash data */
> + *pdata = HV_CRASH_CTL_CRASH_NOTIFY;
> + return 0;
> +}
> +
> +int kvm_mshv_msr_set_crash_ctl(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> +{
> + struct kvm_mshv_ctx *ctx = kvm_vcpu_get_mshv_ctx(vcpu);
> +
> + if (atomic_dec_and_test(&ctx->crash_pending)) {
> + pr_debug("vcpu %p 0x%llx 0x%llx 0x%llx 0x%llx 0x%llx",
> +  vcpu, ctx->crash_p0, ctx->crash_p1, ctx->crash_p2,
> +  ctx->crash_p3, ctx->crash_p4);
> +
> + /* Crash data almost gathered so notify user space */

Why "almost" gathered?

> + kvm_make_request(KVM_REQ_MSHV_CRASH, vcpu);
> + }
> +
> + return 0;
> +}
> +
> +int kvm_mshv_msr_set_crash_data(struct kvm_vcpu *vcpu, u32 m

Re: [PATCH v3 06/10] arm: simplify MMIO dispatching

2015-06-17 Thread Marc Zyngier

On 17/06/15 12:21, Andre Przywara wrote:
> Currently we separate any incoming MMIO request into one of the ARM
> memory map regions and take care to spare the GIC.
> It turns out that this is unnecessary, as we only have one special
> region (the IO port area in the first 64 KByte). The MMIO rbtree
> takes care about unhandled MMIO ranges, so we can simply drop all the
> special range checking (except that for the IO range) in
> kvm_cpu__emulate_mmio().
> As the GIC is handled in the kernel, a GIC MMIO access should never
> reach userland (and we don't know what to do with it anyway).
> This lets us delete some more code and simplifies future extensions
> (like expanding the GIC regions).
> To be in line with the other architectures, move the now simpler
> code into a header file.
> 
> Signed-off-by: Andre Przywara 
> ---
>  arm/include/arm-common/kvm-arch.h | 12 
>  arm/include/arm-common/kvm-cpu-arch.h | 14 --
>  arm/kvm-cpu.c | 16 
>  3 files changed, 12 insertions(+), 30 deletions(-)
> 
> diff --git a/arm/include/arm-common/kvm-arch.h 
> b/arm/include/arm-common/kvm-arch.h
> index 082131d..90d6733 100644
> --- a/arm/include/arm-common/kvm-arch.h
> +++ b/arm/include/arm-common/kvm-arch.h
> @@ -45,18 +45,6 @@ static inline bool arm_addr_in_ioport_region(u64 phys_addr)
>   return phys_addr >= KVM_IOPORT_AREA && phys_addr < limit;
>  }
>  
> -static inline bool arm_addr_in_virtio_mmio_region(u64 phys_addr)
> -{
> - u64 limit = KVM_VIRTIO_MMIO_AREA + ARM_VIRTIO_MMIO_SIZE;
> - return phys_addr >= KVM_VIRTIO_MMIO_AREA && phys_addr < limit;
> -}
> -
> -static inline bool arm_addr_in_pci_region(u64 phys_addr)
> -{
> - u64 limit = KVM_PCI_CFG_AREA + ARM_PCI_CFG_SIZE + ARM_PCI_MMIO_SIZE;
> - return phys_addr >= KVM_PCI_CFG_AREA && phys_addr < limit;
> -}
> -
>  struct kvm_arch {
>   /*
>* We may have to align the guest memory for virtio, so keep the
> diff --git a/arm/include/arm-common/kvm-cpu-arch.h 
> b/arm/include/arm-common/kvm-cpu-arch.h
> index 36c7872..329979a 100644
> --- a/arm/include/arm-common/kvm-cpu-arch.h
> +++ b/arm/include/arm-common/kvm-cpu-arch.h
> @@ -44,8 +44,18 @@ static inline bool kvm_cpu__emulate_io(struct kvm_cpu 
> *vcpu, u16 port, void *dat
>   return false;
>  }
>  
> -bool kvm_cpu__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data,
> -u32 len, u8 is_write);
> +static inline bool kvm_cpu__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr,
> +  u8 *data, u32 len, u8 is_write)
> +{
> + if (arm_addr_in_ioport_region(phys_addr)) {
> + int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN;
> + u16 port = (phys_addr - KVM_IOPORT_AREA) & USHRT_MAX;
> +
> + return kvm__emulate_io(vcpu, port, data, direction, len, 1);
> + }
> +
> + return kvm__emulate_mmio(vcpu, phys_addr, data, len, is_write);
> +}
>  
>  unsigned long kvm_cpu__get_vcpu_mpidr(struct kvm_cpu *vcpu);
>  
> diff --git a/arm/kvm-cpu.c b/arm/kvm-cpu.c
> index ab08815..7780251 100644
> --- a/arm/kvm-cpu.c
> +++ b/arm/kvm-cpu.c
> @@ -139,22 +139,6 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu)
>   return false;
>  }
>  
> -bool kvm_cpu__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data,
> -u32 len, u8 is_write)
> -{
> - if (arm_addr_in_virtio_mmio_region(phys_addr)) {
> - return kvm__emulate_mmio(vcpu, phys_addr, data, len, is_write);
> - } else if (arm_addr_in_ioport_region(phys_addr)) {
> - int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN;
> - u16 port = (phys_addr - KVM_IOPORT_AREA) & USHRT_MAX;
> - return kvm__emulate_io(vcpu, port, data, direction, len, 1);
> - } else if (arm_addr_in_pci_region(phys_addr)) {
> - return kvm__emulate_mmio(vcpu, phys_addr, data, len, is_write);
> - }

Can you explain why this arm_addr_in_pci_region(phys_addr) check has
disappeared in your updated version on this function? It may be a non
issue, but I'd very much like to understand.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 07/10] limit number of VCPUs on demand

2015-06-17 Thread Marc Zyngier

On 17/06/15 12:21, Andre Przywara wrote:
> Currently the ARM GIC checks the number of VCPUs against a fixed
> limit, which is GICv2 specific. Don't pretend we know better than the
> kernel and let's get rid of that explicit check.
> Instead be more relaxed about KVM_CREATE_VCPU failing with EINVAL,
> which is the way the kernel communicates having reached a VCPU limit.
> If we see this and have at least brought up one VCPU already
> successfully, then don't panic, but limit the number of VCPUs instead.
> 
> Signed-off-by: Andre Przywara 
> ---
>  arm/gic.c |  6 --
>  arm/kvm-cpu.c | 11 +--
>  kvm-cpu.c |  7 +++
>  3 files changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/arm/gic.c b/arm/gic.c
> index 99f0d2b..05f85a2 100644
> --- a/arm/gic.c
> +++ b/arm/gic.c
> @@ -84,12 +84,6 @@ int gic__create(struct kvm *kvm)
>  {
>   int err;
>  
> - if (kvm->nrcpus > GIC_MAX_CPUS) {
> - pr_warning("%d CPUS greater than maximum of %d -- truncating\n",
> - kvm->nrcpus, GIC_MAX_CPUS);
> - kvm->nrcpus = GIC_MAX_CPUS;
> - }
> -
>   /* Try the new way first, and fallback on legacy method otherwise */
>   err = gic__create_device(kvm);
>   if (err)
> diff --git a/arm/kvm-cpu.c b/arm/kvm-cpu.c
> index 7780251..c1cf51d 100644
> --- a/arm/kvm-cpu.c
> +++ b/arm/kvm-cpu.c
> @@ -47,12 +47,19 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, 
> unsigned long cpu_id)
>   };
>  
>   vcpu = calloc(1, sizeof(struct kvm_cpu));
> - if (!vcpu)
> + if (!vcpu) {
> + errno = ENOMEM;
>   return NULL;
> + }

Isn't errno already set when calloc fails?

>  
>   vcpu->vcpu_fd = ioctl(kvm->vm_fd, KVM_CREATE_VCPU, cpu_id);
> - if (vcpu->vcpu_fd < 0)
> + if (vcpu->vcpu_fd < 0) {
> + if (errno == EINVAL) {
> + free(vcpu);
> + return NULL;
> + }
>   die_perror("KVM_CREATE_VCPU ioctl");
> + }
>  
>   mmap_size = ioctl(kvm->sys_fd, KVM_GET_VCPU_MMAP_SIZE, 0);
>   if (mmap_size < 0)
> diff --git a/kvm-cpu.c b/kvm-cpu.c
> index 5d90664..7a9d689 100644
> --- a/kvm-cpu.c
> +++ b/kvm-cpu.c
> @@ -222,11 +222,18 @@ int kvm_cpu__init(struct kvm *kvm)
>   for (i = 0; i < kvm->nrcpus; i++) {
>   kvm->cpus[i] = kvm_cpu__arch_init(kvm, i);
>   if (!kvm->cpus[i]) {
> + if (i > 0 && errno == EINVAL)
> + break;
>   pr_warning("unable to initialize KVM VCPU");
>   goto fail_alloc;
>   }
>   }
>  
> + if (i < kvm->nrcpus) {
> + kvm->nrcpus = i;
> + printf("  # The kernel limits the number of CPUs to %d\n", i);
> + }
> +
>   return 0;
>  
>  fail_alloc:
> 

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 08/10] arm: prepare for instantiating different IRQ chip devices

2015-06-17 Thread Marc Zyngier

On 17/06/15 12:21, Andre Przywara wrote:
> Extend the vGIC handling code to potentially deal with different IRQ
> chip devices instead of hard-coding the GICv2 in.
> We extend most vGIC functions to take a type parameter, but still put
> GICv2 in at the top for the time being.
> 
> Signed-off-by: Andre Przywara 
> ---
>  arm/aarch32/arm-cpu.c|  2 +-
>  arm/aarch64/arm-cpu.c|  2 +-
>  arm/gic.c| 44 
> +++-
>  arm/include/arm-common/gic.h |  8 ++--
>  arm/kvm.c|  2 +-
>  5 files changed, 44 insertions(+), 14 deletions(-)
> 
> diff --git a/arm/aarch32/arm-cpu.c b/arm/aarch32/arm-cpu.c
> index 946e443..d8d6293 100644
> --- a/arm/aarch32/arm-cpu.c
> +++ b/arm/aarch32/arm-cpu.c
> @@ -12,7 +12,7 @@ static void generate_fdt_nodes(void *fdt, struct kvm *kvm, 
> u32 gic_phandle)
>  {
>   int timer_interrupts[4] = {13, 14, 11, 10};
>  
> - gic__generate_fdt_nodes(fdt, gic_phandle);
> + gic__generate_fdt_nodes(fdt, gic_phandle, IRQCHIP_GICV2);
>   timer__generate_fdt_nodes(fdt, kvm, timer_interrupts);
>  }
>  
> diff --git a/arm/aarch64/arm-cpu.c b/arm/aarch64/arm-cpu.c
> index 8efe877..f702b9e 100644
> --- a/arm/aarch64/arm-cpu.c
> +++ b/arm/aarch64/arm-cpu.c
> @@ -12,7 +12,7 @@
>  static void generate_fdt_nodes(void *fdt, struct kvm *kvm, u32 gic_phandle)
>  {
>   int timer_interrupts[4] = {13, 14, 11, 10};
> - gic__generate_fdt_nodes(fdt, gic_phandle);
> + gic__generate_fdt_nodes(fdt, gic_phandle, IRQCHIP_GICV2);
>   timer__generate_fdt_nodes(fdt, kvm, timer_interrupts);
>  }
>  
> diff --git a/arm/gic.c b/arm/gic.c
> index 05f85a2..b6c5868 100644
> --- a/arm/gic.c
> +++ b/arm/gic.c
> @@ -11,13 +11,13 @@
>  
>  static int gic_fd = -1;
>  
> -static int gic__create_device(struct kvm *kvm)
> +static int gic__create_device(struct kvm *kvm, enum irqchip_type type)
>  {
>   int err;
>   u64 cpu_if_addr = ARM_GIC_CPUI_BASE;
>   u64 dist_addr = ARM_GIC_DIST_BASE;
>   struct kvm_create_device gic_device = {
> - .type   = KVM_DEV_TYPE_ARM_VGIC_V2,
> + .flags  = 0,
>   };
>   struct kvm_device_attr cpu_if_attr = {
>   .group  = KVM_DEV_ARM_VGIC_GRP_ADDR,
> @@ -26,17 +26,27 @@ static int gic__create_device(struct kvm *kvm)
>   };
>   struct kvm_device_attr dist_attr = {
>   .group  = KVM_DEV_ARM_VGIC_GRP_ADDR,
> - .attr   = KVM_VGIC_V2_ADDR_TYPE_DIST,
>   .addr   = (u64)(unsigned long)&dist_addr,
>   };
>  
> + switch (type) {
> + case IRQCHIP_GICV2:
> + gic_device.type = KVM_DEV_TYPE_ARM_VGIC_V2;
> + dist_attr.attr  = KVM_VGIC_V2_ADDR_TYPE_DIST;
> + break;
> + }
> +
>   err = ioctl(kvm->vm_fd, KVM_CREATE_DEVICE, &gic_device);
>   if (err)
>   return err;
>  
>   gic_fd = gic_device.fd;
>  
> - err = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, &cpu_if_attr);
> + switch (type) {
> + case IRQCHIP_GICV2:
> + err = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, &cpu_if_attr);
> + break;
> + }
>   if (err)
>   goto out_err;
>  
> @@ -80,13 +90,20 @@ static int gic__create_irqchip(struct kvm *kvm)
>   return err;
>  }
>  
> -int gic__create(struct kvm *kvm)
> +int gic__create(struct kvm *kvm, enum irqchip_type type)
>  {
>   int err;
>  
> + switch (type) {
> + case IRQCHIP_GICV2:
> + break;
> + default:
> + return -ENODEV;
> + }
> +
>   /* Try the new way first, and fallback on legacy method otherwise */
> - err = gic__create_device(kvm);
> - if (err)
> + err = gic__create_device(kvm, type);
> + if (err && type == IRQCHIP_GICV2)
>   err = gic__create_irqchip(kvm);
>  
>   return err;
> @@ -134,15 +151,24 @@ static int gic__init_gic(struct kvm *kvm)
>  }
>  late_init(gic__init_gic)
>  
> -void gic__generate_fdt_nodes(void *fdt, u32 phandle)
> +void gic__generate_fdt_nodes(void *fdt, u32 phandle, enum irqchip_type type)
>  {
> + const char *compatible;
>   u64 reg_prop[] = {
>   cpu_to_fdt64(ARM_GIC_DIST_BASE), 
> cpu_to_fdt64(ARM_GIC_DIST_SIZE),
>   cpu_to_fdt64(ARM_GIC_CPUI_BASE), 
> cpu_to_fdt64(ARM_GIC_CPUI_SIZE),
>   };
>  
> + switch (type) {
> + case IRQCHIP_GICV2:
> + compatible = "arm,cortex-a15-gic";
> + break;
> + default:
> + return;
> + }
> +
>   _FDT(fdt_begin_node(fdt, "intc"));
> - _FDT(fdt_property_string(fdt, "compatible", "arm,cortex-a15-gic"));
> + _FDT(fdt_property_string(fdt, "compatible", compatible));
>   _FDT(fdt_property_cell(fdt, "#interrupt-cells", GIC_FDT_IRQ_NUM_CELLS));
>   _FDT(fdt_property(fdt, "interrupt-controller", NULL, 0));
>   _FDT(fdt_property(fdt, "reg", reg_prop, sizeof(reg_prop)));
> diff --git a/arm/include/arm-common/gic.h b/arm/include/arm-common/gic.h
>

Re: [PATCH v3 09/10] arm: add support for supplying GICv3 redistributor addresses

2015-06-17 Thread Marc Zyngier

On 17/06/15 12:22, Andre Przywara wrote:
> Instead of the GIC virtual CPU interface an emulated GICv3 needs to
> have accesses to its emulated redistributors trapped in the guest.
> Add code to tell the kernel about the mapping if a GICv3 emulation was
> requested by the user.
> 
> This contains some defines which are not (yet) in the (32 bit) header
> files to allow compilation for ARM.
> 
> Signed-off-by: Andre Przywara 
> ---
>  arm/gic.c | 36 +++-
>  arm/include/arm-common/gic.h  |  3 ++-
>  arm/include/arm-common/kvm-arch.h |  7 +++
>  3 files changed, 44 insertions(+), 2 deletions(-)
> 
> diff --git a/arm/gic.c b/arm/gic.c
> index b6c5868..efe4b42 100644
> --- a/arm/gic.c
> +++ b/arm/gic.c
> @@ -9,7 +9,18 @@
>  #include 
>  #include 
>  
> +/* Those names are not defined for ARM (yet) */
> +#ifndef KVM_VGIC_V3_ADDR_TYPE_DIST
> +#define KVM_VGIC_V3_ADDR_TYPE_DIST 2
> +#endif
> +
> +#ifndef KVM_VGIC_V3_ADDR_TYPE_REDIST
> +#define KVM_VGIC_V3_ADDR_TYPE_REDIST 3
> +#endif
> +
>  static int gic_fd = -1;
> +static u64 gic_redists_base;
> +static u64 gic_redists_size;
>  
>  static int gic__create_device(struct kvm *kvm, enum irqchip_type type)
>  {
> @@ -28,12 +39,21 @@ static int gic__create_device(struct kvm *kvm, enum 
> irqchip_type type)
>   .group  = KVM_DEV_ARM_VGIC_GRP_ADDR,
>   .addr   = (u64)(unsigned long)&dist_addr,
>   };
> + struct kvm_device_attr redist_attr = {
> + .group  = KVM_DEV_ARM_VGIC_GRP_ADDR,
> + .attr   = KVM_VGIC_V3_ADDR_TYPE_REDIST,
> + .addr   = (u64)(unsigned long)&gic_redists_base,
> + };
>  
>   switch (type) {
>   case IRQCHIP_GICV2:
>   gic_device.type = KVM_DEV_TYPE_ARM_VGIC_V2;
>   dist_attr.attr  = KVM_VGIC_V2_ADDR_TYPE_DIST;
>   break;
> + case IRQCHIP_GICV3:
> + gic_device.type = KVM_DEV_TYPE_ARM_VGIC_V3;
> + dist_attr.attr  = KVM_VGIC_V3_ADDR_TYPE_DIST;
> + break;
>   }
>  
>   err = ioctl(kvm->vm_fd, KVM_CREATE_DEVICE, &gic_device);
> @@ -46,6 +66,9 @@ static int gic__create_device(struct kvm *kvm, enum 
> irqchip_type type)
>   case IRQCHIP_GICV2:
>   err = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, &cpu_if_attr);
>   break;
> + case IRQCHIP_GICV3:
> + err = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, &redist_attr);
> + break;
>   }
>   if (err)
>   goto out_err;
> @@ -97,6 +120,10 @@ int gic__create(struct kvm *kvm, enum irqchip_type type)
>   switch (type) {
>   case IRQCHIP_GICV2:
>   break;
> + case IRQCHIP_GICV3:
> + gic_redists_size = kvm->cfg.nrcpus * ARM_GIC_REDIST_SIZE;
> + gic_redists_base = ARM_GIC_DIST_BASE - gic_redists_size;
> + break;
>   default:
>   return -ENODEV;
>   }
> @@ -156,12 +183,19 @@ void gic__generate_fdt_nodes(void *fdt, u32 phandle, 
> enum irqchip_type type)
>   const char *compatible;
>   u64 reg_prop[] = {
>   cpu_to_fdt64(ARM_GIC_DIST_BASE), 
> cpu_to_fdt64(ARM_GIC_DIST_SIZE),
> - cpu_to_fdt64(ARM_GIC_CPUI_BASE), 
> cpu_to_fdt64(ARM_GIC_CPUI_SIZE),
> + 0, 0,   /* to be filled */
>   };
>  
>   switch (type) {
>   case IRQCHIP_GICV2:
>   compatible = "arm,cortex-a15-gic";
> + reg_prop[2] = cpu_to_fdt64(ARM_GIC_CPUI_BASE);
> + reg_prop[3] = cpu_to_fdt64(ARM_GIC_CPUI_SIZE);
> + break;
> + case IRQCHIP_GICV3:
> + compatible = "arm,gic-v3";
> + reg_prop[2] = cpu_to_fdt64(gic_redists_base);
> + reg_prop[3] = cpu_to_fdt64(gic_redists_size);
>   break;
>   default:
>   return;
> diff --git a/arm/include/arm-common/gic.h b/arm/include/arm-common/gic.h
> index 2ed76fa..403d93b 100644
> --- a/arm/include/arm-common/gic.h
> +++ b/arm/include/arm-common/gic.h
> @@ -22,7 +22,8 @@
>  #define GIC_MAX_IRQ  255
>  
>  enum irqchip_type {
> - IRQCHIP_GICV2
> + IRQCHIP_GICV2,
> + IRQCHIP_GICV3

Same remark as for the previous patch.

>  };
>  
>  struct kvm;
> diff --git a/arm/include/arm-common/kvm-arch.h 
> b/arm/include/arm-common/kvm-arch.h
> index 90d6733..0f5fb7f 100644
> --- a/arm/include/arm-common/kvm-arch.h
> +++ b/arm/include/arm-common/kvm-arch.h
> @@ -30,6 +30,13 @@
>  #define KVM_PCI_MMIO_AREA(KVM_PCI_CFG_AREA + ARM_PCI_CFG_SIZE)
>  #define KVM_VIRTIO_MMIO_AREA ARM_MMIO_AREA
>  
> +/*
> + * On a GICv3 there must be one redistributor per vCPU.
> + * The value here is the size for one, we multiply this at runtime with
> + * the number of requested vCPUs to get the actual size.
> + */
> +#define ARM_GIC_REDIST_SIZE  0x2
> +
>  #define KVM_IRQ_OFFSET   GIC_SPI_IRQ_BASE
>  
>  #define KVM_VM_TYPE  0
> 

Reviewed-by: Marc Zyngier

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Michael S. Tsirkin

On Wed, Jun 17, 2015 at 02:23:39PM +0200, Igor Mammedov wrote:
> On Wed, 17 Jun 2015 13:51:56 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > On Wed, Jun 17, 2015 at 01:48:03PM +0200, Igor Mammedov wrote:
> > > > > So far it's kernel limitation and this patch fixes crashes
> > > > > that users see now, with the rest of patches enabling performance
> > > > > not to regress.
> > > > 
> > > > When I say regression I refer to an option to limit the array
> > > > size again after userspace started using the larger size.
> > > Is there a need to do so?
> > 
> > Considering userspace can be malicious, I guess yes.
> I don't think it's a valid concern in this case,
> setting limit back from 509 to 64 will not help here in any way,
> userspace still can create as many vhost instances as it needs
> to consume memory it desires.

Not really since vhost char device isn't world-accessible.
It's typically opened by a priveledged tool, the fd is
then passed to an unpriveledged userspace, or permissions dropped.

> > 
> > > Userspace that cares about memory footprint won't use many slots
> > > keeping it low and user space that can't do without many slots
> > > or doesn't care will have bigger memory footprint.
> > 
> > We really can't trust userspace to do the right thing though.
> > 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 4/6] vhost: translate_desc: optimization for desc.len < region size

2015-06-17 Thread Igor Mammedov

when translating descriptors they are typically less than
memory region that holds them and translated into 1 iov
entry, so it's not nessesary to check remaining length
twice and calculate used length and next address
in such cases.

replace a remaining length and 'size' increment branches
with a single remaining length check and execute
next iov steps only when it needed.

It saves a tiny 2% of translate_desc() execution time.

Signed-off-by: Igor Mammedov 
---
PS:
I'm not sure if iov_size > 0 is always true, if it's not
then better to drop this patch.
---
 drivers/vhost/vhost.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5c39a1e..5bcb323 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -,12 +,8 @@ static int translate_desc(struct vhost_virtqueue *vq, 
u64 addr, u32 len,
int ret = 0;
 
mem = vq->memory;
-   while ((u64)len > s) {
+   do {
u64 size;
-   if (unlikely(ret >= iov_size)) {
-   ret = -ENOBUFS;
-   break;
-   }
reg = find_region(mem, addr, len, &vq->cached_reg);
if (unlikely(!reg)) {
ret = -EFAULT;
@@ -1124,13 +1120,22 @@ static int translate_desc(struct vhost_virtqueue *vq, 
u64 addr, u32 len,
}
_iov = iov + ret;
size = reg->memory_size - addr + reg->guest_phys_addr;
-   _iov->iov_len = min((u64)len - s, size);
_iov->iov_base = (void __user *)(unsigned long)
(reg->userspace_addr + addr - reg->guest_phys_addr);
+   ++ret;
+   if (likely((u64)len - s < size)) {
+   _iov->iov_len = (u64)len - s;
+   break;
+   }
+
+   if (unlikely(ret >= iov_size)) {
+   ret = -ENOBUFS;
+   break;
+   }
+   _iov->iov_len = size;
s += size;
addr += size;
-   ++ret;
-   }
+   } while (1);
 
return ret;
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 07/10] limit number of VCPUs on demand

2015-06-17 Thread Andre Przywara

On 06/17/2015 01:53 PM, Marc Zyngier wrote:
> On 17/06/15 12:21, Andre Przywara wrote:
>> Currently the ARM GIC checks the number of VCPUs against a fixed
>> limit, which is GICv2 specific. Don't pretend we know better than the
>> kernel and let's get rid of that explicit check.
>> Instead be more relaxed about KVM_CREATE_VCPU failing with EINVAL,
>> which is the way the kernel communicates having reached a VCPU limit.
>> If we see this and have at least brought up one VCPU already
>> successfully, then don't panic, but limit the number of VCPUs instead.
>>
>> Signed-off-by: Andre Przywara 

...

>> diff --git a/arm/kvm-cpu.c b/arm/kvm-cpu.c
>> index 7780251..c1cf51d 100644
>> --- a/arm/kvm-cpu.c
>> +++ b/arm/kvm-cpu.c
>> @@ -47,12 +47,19 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, 
>> unsigned long cpu_id)
>>  };
>>  
>>  vcpu = calloc(1, sizeof(struct kvm_cpu));
>> -if (!vcpu)
>> +if (!vcpu) {
>> +errno = ENOMEM;
>>  return NULL;
>> +}
> 
> Isn't errno already set when calloc fails?

Ah yes, that seems to be true at least for glibc or UNIX 98, according
to the manpage. I was misguided by the fact that calloc is not a
syscall. So I can drop this hunk.

Thanks,
Andre.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 6/6] vhost: support upto 509 memory regions

2015-06-17 Thread Igor Mammedov

since commit
 1d4e7e3 kvm: x86: increase user memory slots to 509

it became possible to use a bigger amount of memory
slots, which is used by memory hotplug for
registering hotplugged memory.
However QEMU crashes if it's used with more than ~60
pc-dimm devices and vhost-net since host kernel
in module vhost-net refuses to accept more than 64
memory regions.

Increase VHOST_MEMORY_MAX_NREGIONS limit from 64 to 509
to match KVM_USER_MEM_SLOTS to fix issue for vhost-net
and current QEMU versions.

Signed-off-by: Igor Mammedov 
---
 drivers/vhost/vhost.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 78290b7..e93023e 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -37,7 +37,7 @@ MODULE_PARM_DESC(translation_cache,
" Set to 0 to disable. (default: 1)");
 
 enum {
-   VHOST_MEMORY_MAX_NREGIONS = 64,
+   VHOST_MEMORY_MAX_NREGIONS = 509,
VHOST_MEMORY_F_LOG = 0x1,
 };
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 5/6] vhost: add 'translation_cache' module parameter

2015-06-17 Thread Igor Mammedov

by default translation of virtqueue descriptors is done with
caching enabled, but caching will add only extra cost
in cases of trashing workload where majority descriptors
are translated to different memory regions.
So add an option to allow exclude cache miss cost for such cases.

Performance with cashing enabled for sequential workload
doesn't seem to be affected much vs version without static key switch,
i.e. still the same 0.2% of total time with key(NOPs) consuming
5ms on 5min workload.

Signed-off-by: Igor Mammedov 
---
I don't have a test case for trashing workload though, but jmp
instruction adds up ~6ms(55M instructions) minus excluded caching
around 24ms on 5min workload.
---
 drivers/vhost/vhost.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5bcb323..78290b7 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -29,6 +29,13 @@
 
 #include "vhost.h"
 
+struct static_key translation_cache_key = STATIC_KEY_INIT_TRUE;
+static bool translation_cache = true;
+module_param(translation_cache, bool, 0444);
+MODULE_PARM_DESC(translation_cache,
+   "Enables/disables virtqueue descriptor translation caching,"
+   " Set to 0 to disable. (default: 1)");
+
 enum {
VHOST_MEMORY_MAX_NREGIONS = 64,
VHOST_MEMORY_F_LOG = 0x1,
@@ -944,10 +951,12 @@ static const struct vhost_memory_region 
*find_region(struct vhost_memory *mem,
const struct vhost_memory_region *reg;
int start = 0, end = mem->nregions;
 
-   reg = mem->regions + *cached_reg;
-   if (likely(addr >= reg->guest_phys_addr &&
-   reg->guest_phys_addr + reg->memory_size > addr))
-   return reg;
+   if (static_key_true(&translation_cache_key)) {
+   reg = mem->regions + *cached_reg;
+   if (likely(addr >= reg->guest_phys_addr &&
+   reg->guest_phys_addr + reg->memory_size > addr))
+   return reg;
+   }
 
while (start < end) {
int slot = start + (end - start) / 2;
@@ -1612,6 +1621,9 @@ EXPORT_SYMBOL_GPL(vhost_disable_notify);
 
 static int __init vhost_init(void)
 {
+   if (!translation_cache)
+   static_key_slow_dec(&translation_cache_key);
+
return 0;
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/6] vhost: use binary search instead of linear in find_region()

2015-06-17 Thread Igor Mammedov

For default region layouts performance stays the same
as linear search i.e. it takes around 210ns average for
translate_desc() that inlines find_region().

But it scales better with larger amount of regions,
235ns BS vs 300ns LS with 55 memory regions
and it will be about the same values when allowed number
of slots is increased to 509 like it has been done in KVM.

Signed-off-by: Igor Mammedov 
---
v2:
  move kvfree() to 2/2 where it belongs
---
 drivers/vhost/vhost.c | 36 +++-
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 2ee2826..f1e07b8 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "vhost.h"
 
@@ -590,6 +591,16 @@ int vhost_vq_access_ok(struct vhost_virtqueue *vq)
 }
 EXPORT_SYMBOL_GPL(vhost_vq_access_ok);
 
+static int vhost_memory_reg_sort_cmp(const void *p1, const void *p2)
+{
+   const struct vhost_memory_region *r1 = p1, *r2 = p2;
+   if (r1->guest_phys_addr < r2->guest_phys_addr)
+   return 1;
+   if (r1->guest_phys_addr > r2->guest_phys_addr)
+   return -1;
+   return 0;
+}
+
 static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user 
*m)
 {
struct vhost_memory mem, *newmem, *oldmem;
@@ -612,6 +623,8 @@ static long vhost_set_memory(struct vhost_dev *d, struct 
vhost_memory __user *m)
kfree(newmem);
return -EFAULT;
}
+   sort(newmem->regions, newmem->nregions, sizeof(*newmem->regions),
+   vhost_memory_reg_sort_cmp, NULL);
 
if (!memory_access_ok(d, newmem, 0)) {
kfree(newmem);
@@ -913,17 +926,22 @@ EXPORT_SYMBOL_GPL(vhost_dev_ioctl);
 static const struct vhost_memory_region *find_region(struct vhost_memory *mem,
 __u64 addr, __u32 len)
 {
-   struct vhost_memory_region *reg;
-   int i;
+   const struct vhost_memory_region *reg;
+   int start = 0, end = mem->nregions;
 
-   /* linear search is not brilliant, but we really have on the order of 6
-* regions in practice */
-   for (i = 0; i < mem->nregions; ++i) {
-   reg = mem->regions + i;
-   if (reg->guest_phys_addr <= addr &&
-   reg->guest_phys_addr + reg->memory_size - 1 >= addr)
-   return reg;
+   while (start < end) {
+   int slot = start + (end - start) / 2;
+   reg = mem->regions + slot;
+   if (addr >= reg->guest_phys_addr)
+   end = slot;
+   else
+   start = slot + 1;
}
+
+   reg = mem->regions + start;
+   if (addr >= reg->guest_phys_addr &&
+   reg->guest_phys_addr + reg->memory_size > addr)
+   return reg;
return NULL;
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 10/10] arm: use new irqchip parameter to create different vGIC types

2015-06-17 Thread Marc Zyngier

On 17/06/15 12:22, Andre Przywara wrote:
> Currently we unconditionally create a virtual GICv2 in the guest.
> Add a --irqchip= parameter to let the user specify a different GIC
> type for the guest.
> For now we the only other supported type is GICv3.

Superfluous "we".

Also, spelling out the expected values would be a good thing
(--irqchip=GICv3 would fail).

> 
> Signed-off-by: Andre Przywara 
> ---
>  arm/aarch64/arm-cpu.c|  2 +-
>  arm/gic.c| 17 +
>  arm/include/arm-common/kvm-config-arch.h |  9 -
>  arm/kvm.c|  2 +-
>  4 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/arm/aarch64/arm-cpu.c b/arm/aarch64/arm-cpu.c
> index f702b9e..3dc8ea3 100644
> --- a/arm/aarch64/arm-cpu.c
> +++ b/arm/aarch64/arm-cpu.c
> @@ -12,7 +12,7 @@
>  static void generate_fdt_nodes(void *fdt, struct kvm *kvm, u32 gic_phandle)
>  {
>   int timer_interrupts[4] = {13, 14, 11, 10};
> - gic__generate_fdt_nodes(fdt, gic_phandle, IRQCHIP_GICV2);
> + gic__generate_fdt_nodes(fdt, gic_phandle, kvm->cfg.arch.irqchip);
>   timer__generate_fdt_nodes(fdt, kvm, timer_interrupts);
>  }
>  
> diff --git a/arm/gic.c b/arm/gic.c
> index efe4b42..5b49416 100644
> --- a/arm/gic.c
> +++ b/arm/gic.c
> @@ -22,6 +22,23 @@ static int gic_fd = -1;
>  static u64 gic_redists_base;
>  static u64 gic_redists_size;
>  
> +int irqchip_parser(const struct option *opt, const char *arg, int unset)
> +{
> + enum irqchip_type *type = opt->value;
> +
> + *type = IRQCHIP_GICV2;
> + if (!strcmp(arg, "gicv2")) {
> + *type = IRQCHIP_GICV2;
> + } else if (!strcmp(arg, "gicv3")) {
> + *type = IRQCHIP_GICV3;
> + } else if (strcmp(arg, "default")) {
> + fprintf(stderr, "irqchip: unknown type \"%s\"\n", arg);
> + return -1;
> + }
> +
> + return 0;
> +}
> +
>  static int gic__create_device(struct kvm *kvm, enum irqchip_type type)
>  {
>   int err;
> diff --git a/arm/include/arm-common/kvm-config-arch.h 
> b/arm/include/arm-common/kvm-config-arch.h
> index a8ebd94..9529881 100644
> --- a/arm/include/arm-common/kvm-config-arch.h
> +++ b/arm/include/arm-common/kvm-config-arch.h
> @@ -8,8 +8,11 @@ struct kvm_config_arch {
>   unsigned intforce_cntfrq;
>   boolvirtio_trans_pci;
>   boolaarch32_guest;
> + enum irqchip_type irqchip;
>  };
>  
> +int irqchip_parser(const struct option *opt, const char *arg, int unset);
> +
>  #define OPT_ARCH_RUN(pfx, cfg)   
> \
>   pfx,
> \
>   ARM_OPT_ARCH_RUN(cfg)   
> \
> @@ -21,6 +24,10 @@ struct kvm_config_arch {
>"updated to program CNTFRQ correctly*"),   
> \
>   OPT_BOOLEAN('\0', "force-pci", &(cfg)->virtio_trans_pci,
> \
>   "Force virtio devices to use PCI as their default " 
> \
> - "transport"),
> + "transport"),   
> \
> +OPT_CALLBACK('\0', "irqchip", &(cfg)->irqchip,   
> \
> +  "[gicv2|gicv3]",   \

Looks like "default" is also an acceptable string, which you don't
document. I'd be inclined to remove the "default" handling altogether,
and document that without --irqchip, you get a GICv2. At some point, it
would be good to have a --irqchip=host, but we need some additional
kernel support for this.

> +  "type of interrupt controller to emulate in the guest",
> \
> +  irqchip_parser, NULL),
>  
>  #endif /* ARM_COMMON__KVM_CONFIG_ARCH_H */
> diff --git a/arm/kvm.c b/arm/kvm.c
> index f9685c2..d0e4a20 100644
> --- a/arm/kvm.c
> +++ b/arm/kvm.c
> @@ -82,6 +82,6 @@ void kvm__arch_init(struct kvm *kvm, const char 
> *hugetlbfs_path, u64 ram_size)
>   MADV_MERGEABLE | MADV_HUGEPAGE);
>  
>   /* Create the virtual GIC. */
> - if (gic__create(kvm, IRQCHIP_GICV2))
> + if (gic__create(kvm, kvm->cfg.arch.irqchip))
>   die("Failed to create virtual GIC");
>  }
> 

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/6] vhost: extend memory regions allocation to vmalloc

2015-06-17 Thread Igor Mammedov

with large number of memory regions we could end up with
high order allocations and kmalloc could fail if
host is under memory pressure.
Considering that memory regions array is used on hot path
try harder to allocate using kmalloc and if it fails resort
to vmalloc.
It's still better than just failing vhost_set_memory() and
causing guest crash due to it when a new memory hotplugged
to guest.

I'll still look at QEMU side solution to reduce amount of
memory regions it feeds to vhost to make things even better,
but it doesn't hurt for kernel to behave smarter and don't
crash older QEMU's which could use large amount of memory
regions.

Signed-off-by: Igor Mammedov 
---
 drivers/vhost/vhost.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f1e07b8..99931a0 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -471,7 +471,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev, bool locked)
fput(dev->log_file);
dev->log_file = NULL;
/* No one will access memory at this point */
-   kfree(dev->memory);
+   kvfree(dev->memory);
dev->memory = NULL;
WARN_ON(!list_empty(&dev->work_list));
if (dev->worker) {
@@ -601,6 +601,18 @@ static int vhost_memory_reg_sort_cmp(const void *p1, const 
void *p2)
return 0;
 }
 
+static void *vhost_kvzalloc(unsigned long size)
+{
+   void *n = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
+
+   if (!n) {
+   n = vzalloc(size);
+   if (!n)
+   return ERR_PTR(-ENOMEM);
+   }
+   return n;
+}
+
 static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user 
*m)
 {
struct vhost_memory mem, *newmem, *oldmem;
@@ -613,21 +625,21 @@ static long vhost_set_memory(struct vhost_dev *d, struct 
vhost_memory __user *m)
return -EOPNOTSUPP;
if (mem.nregions > VHOST_MEMORY_MAX_NREGIONS)
return -E2BIG;
-   newmem = kmalloc(size + mem.nregions * sizeof *m->regions, GFP_KERNEL);
+   newmem = vhost_kvzalloc(size + mem.nregions * sizeof(*m->regions));
if (!newmem)
return -ENOMEM;
 
memcpy(newmem, &mem, size);
if (copy_from_user(newmem->regions, m->regions,
   mem.nregions * sizeof *m->regions)) {
-   kfree(newmem);
+   kvfree(newmem);
return -EFAULT;
}
sort(newmem->regions, newmem->nregions, sizeof(*newmem->regions),
vhost_memory_reg_sort_cmp, NULL);
 
if (!memory_access_ok(d, newmem, 0)) {
-   kfree(newmem);
+   kvfree(newmem);
return -EFAULT;
}
oldmem = d->memory;
@@ -639,7 +651,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct 
vhost_memory __user *m)
d->vqs[i]->memory = newmem;
mutex_unlock(&d->vqs[i]->mutex);
}
-   kfree(oldmem);
+   kvfree(oldmem);
return 0;
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 3/6] vhost: add per VQ memory region caching

2015-06-17 Thread Igor Mammedov

that brings down translate_desc() cost to around 210ns
if accessed descriptors are from the same memory region.

Signed-off-by: Igor Mammedov 
---
that's what netperf/iperf workloads were during testing.
---
 drivers/vhost/vhost.c | 16 +---
 drivers/vhost/vhost.h |  1 +
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 99931a0..5c39a1e 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -200,6 +200,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
vq->call = NULL;
vq->log_ctx = NULL;
vq->memory = NULL;
+   vq->cached_reg = 0;
 }
 
 static int vhost_worker(void *data)
@@ -649,6 +650,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct 
vhost_memory __user *m)
for (i = 0; i < d->nvqs; ++i) {
mutex_lock(&d->vqs[i]->mutex);
d->vqs[i]->memory = newmem;
+   d->vqs[i]->cached_reg = 0;
mutex_unlock(&d->vqs[i]->mutex);
}
kvfree(oldmem);
@@ -936,11 +938,17 @@ done:
 EXPORT_SYMBOL_GPL(vhost_dev_ioctl);
 
 static const struct vhost_memory_region *find_region(struct vhost_memory *mem,
-__u64 addr, __u32 len)
+__u64 addr, __u32 len,
+int *cached_reg)
 {
const struct vhost_memory_region *reg;
int start = 0, end = mem->nregions;
 
+   reg = mem->regions + *cached_reg;
+   if (likely(addr >= reg->guest_phys_addr &&
+   reg->guest_phys_addr + reg->memory_size > addr))
+   return reg;
+
while (start < end) {
int slot = start + (end - start) / 2;
reg = mem->regions + slot;
@@ -952,8 +960,10 @@ static const struct vhost_memory_region 
*find_region(struct vhost_memory *mem,
 
reg = mem->regions + start;
if (addr >= reg->guest_phys_addr &&
-   reg->guest_phys_addr + reg->memory_size > addr)
+   reg->guest_phys_addr + reg->memory_size > addr) {
+   *cached_reg = start;
return reg;
+   }
return NULL;
 }
 
@@ -1107,7 +1117,7 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 
addr, u32 len,
ret = -ENOBUFS;
break;
}
-   reg = find_region(mem, addr, len);
+   reg = find_region(mem, addr, len, &vq->cached_reg);
if (unlikely(!reg)) {
ret = -EFAULT;
break;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 8c1c792..68bd00f 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -106,6 +106,7 @@ struct vhost_virtqueue {
/* Log write descriptors */
void __user *log_base;
struct vhost_log *log;
+   int cached_reg;
 };
 
 struct vhost_dev {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Paolo Bonzini



On 17/06/2015 15:13, Michael S. Tsirkin wrote:
> > > Considering userspace can be malicious, I guess yes.
> > I don't think it's a valid concern in this case,
> > setting limit back from 509 to 64 will not help here in any way,
> > userspace still can create as many vhost instances as it needs
> > to consume memory it desires.
> 
> Not really since vhost char device isn't world-accessible.
> It's typically opened by a priveledged tool, the fd is
> then passed to an unpriveledged userspace, or permissions dropped.

Then what's the concern anyway?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 0/6] vhost: support upto 509 memory regions

2015-06-17 Thread Igor Mammedov

Ref to prefious version discussion:
[PATCH 0/5] vhost: support upto 509 memory regions
http://www.spinics.net/lists/kvm/msg117654.html

Chagelog v1->v2:
  * fix spelling errors
  * move "vhost: support upto 509 memory regions" to the end of queue
  * move kvfree() form 1/6 to 2/6 where it belongs
  * add vhost module parameter to enable/disable translation caching

Series extends vhost to support upto 509 memory regions,
and adds some vhost:translate_desc() performance improvemnts
so it won't regress when memslots are increased to 509.

It fixes running VM crashing during memory hotplug due
to vhost refusing accepting more than 64 memory regions.

It's only host kernel side fix to make it work with QEMU
versions that support memory hotplug. But I'll continue
to work on QEMU side solution to reduce amount of memory
regions to make things even better.

Performance wise for guest with (in my case 3 memory regions)
and netperf's UDP_RR workload translate_desc() execution
time from total workload takes:

Memory  |1G RAM|cached|non cached
regions #   |  3   |  53  |  53

upstream| 0.3% |  -   | 3.5%

this series | 0.2% | 0.5% | 0.7%

where "non cached" column reflects trashing wokload
with constant cache miss. More details on timing in
respective patches.

Igor Mammedov (6):
  vhost: use binary search instead of linear in find_region()
  vhost: extend memory regions allocation to vmalloc
  vhost: add per VQ memory region caching
  vhost: translate_desc: optimization for desc.len < region size
  vhost: add 'translation_cache' module parameter
  vhost: support upto 509 memory regions

 drivers/vhost/vhost.c | 105 ++
 drivers/vhost/vhost.h |   1 +
 2 files changed, 82 insertions(+), 24 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 04/10] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR

2015-06-17 Thread Peter Maydell

On 17 June 2015 at 12:53, Eric Auger  wrote:
> shouldn't we test somewhere that the hwirq is between 16 and 1019.

Not directly related, but that reminds me that I noticed the
other day that we have VGIC_MAX_IRQS = 1024 (and use that as a
guard on how many irqs we let userspace configure and ask us
to deliver), but that doesn't account for the couple of magic
numbers at the top of the range. I think that lets userspace
cause us to do UNPREDICTABLE things to the GIC...

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 17/18] x86/kvm/tsc: Drop extra barrier and use rdtsc_ordered in kvmclock

2015-06-17 Thread Paolo Bonzini



On 17/06/2015 09:47, Paolo Bonzini wrote:
> 
> 
> On 17/06/2015 02:36, Andy Lutomirski wrote:
>> __pvclock_read_cycles had an unnecessary barrier.  Get rid of that
>> barrier and clean up the code by just using rdtsc_ordered().
>>
>> Cc: Paolo Bonzini 
>> Cc: Radim Krcmar 
>> Cc: Marcelo Tosatti 
>> Cc: kvm@vger.kernel.org
>> Signed-off-by: Andy Lutomirski 
>> ---
>>
>> I'm hoping to get an ack for this to go in through -tip.  (Arguably
>> I'm the maintainer of this code given how it's used, but I should
>> still ask for an ack.)
>>
>> arch/x86/include/asm/pvclock.h | 21 -
>>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> Can you send a URL to the rest of the series?  I've never even seen v1
> or v2 so I have no idea of what this is about.

Ah, it was sent to the KVM list, just not CCed to me. :)

Sorry, that's what you get when your unread message count does not fit
in three digits anymore.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 02/18] x86/msr/kvm: Remove vget_cycles()

2015-06-17 Thread Paolo Bonzini



On 17/06/2015 02:35, Andy Lutomirski wrote:
> The only caller was kvm's read_tsc.  The only difference between
> vget_cycles and native_read_tsc was that vget_cycles returned zero
> instead of crashing on TSC-less systems.  KVM's already checks
> vclock_mode before calling that function, so the extra check is
> unnecessary.

Or more simply, KVM (host-side) requires the TSC to exist.

Acked-by: Paolo Bonzini 

> (Off-topic, but the whole KVM clock host implementation is gross.
>  IMO it should be rewritten.)
> 
> Signed-off-by: Andy Lutomirski 
> ---
>  arch/x86/include/asm/tsc.h | 13 -
>  arch/x86/kvm/x86.c |  2 +-
>  2 files changed, 1 insertion(+), 14 deletions(-)
> 
> diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
> index fd11128faf25..3da1cc1218ac 100644
> --- a/arch/x86/include/asm/tsc.h
> +++ b/arch/x86/include/asm/tsc.h
> @@ -32,19 +32,6 @@ static inline cycles_t get_cycles(void)
>   return ret;
>  }
>  
> -static __always_inline cycles_t vget_cycles(void)
> -{
> - /*
> -  * We only do VDSOs on TSC capable CPUs, so this shouldn't
> -  * access boot_cpu_data (which is not VDSO-safe):
> -  */
> -#ifndef CONFIG_X86_TSC
> - if (!cpu_has_tsc)
> - return 0;
> -#endif
> - return (cycles_t)native_read_tsc();
> -}
> -
>  extern void tsc_init(void);
>  extern void mark_tsc_unstable(char *reason);
>  extern int unsynchronized_tsc(void);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 26eaeb522cab..c26faf408bce 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1430,7 +1430,7 @@ static cycle_t read_tsc(void)
>* but no one has ever seen it happen.
>*/
>   rdtsc_barrier();
> - ret = (cycle_t)vget_cycles();
> + ret = (cycle_t)native_read_tsc();
>  
>   last = pvclock_gtod_data.clock.cycle_last;
>  
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 04/10] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR

2015-06-17 Thread Marc Zyngier

On 17/06/15 14:21, Peter Maydell wrote:
> On 17 June 2015 at 12:53, Eric Auger  wrote:
>> shouldn't we test somewhere that the hwirq is between 16 and 1019.
> 
> Not directly related, but that reminds me that I noticed the
> other day that we have VGIC_MAX_IRQS = 1024 (and use that as a
> guard on how many irqs we let userspace configure and ask us
> to deliver), but that doesn't account for the couple of magic
> numbers at the top of the range. I think that lets userspace
> cause us to do UNPREDICTABLE things to the GIC...

Good point. How about the following:

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 78fb820..950064a 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1561,7 +1561,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, 
unsigned int irq_num,
goto out;
}
 
-   if (irq_num >= kvm->arch.vgic.nr_irqs)
+   if (irq_num >= min(kvm->arch.vgic.nr_irqs, 1020))
return -EINVAL;
 
vcpu_id = vgic_update_irq_pending(kvm, cpuid, irq_num, level);
@@ -2161,10 +2161,7 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id,
 
BUG_ON(!vgic_initialized(kvm));
 
-   if (spi > kvm->arch.vgic.nr_irqs)
-   return -EINVAL;
return kvm_vgic_inject_irq(kvm, 0, spi, level);
-
 }
 
 /* MSI not implemented yet */

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 00/18] x86/tsc: Clean up rdtsc helpers

2015-06-17 Thread Paolo Bonzini



On 17/06/2015 13:11, Borislav Petkov wrote:
> peterz reminded me that I'm lazy actually and don't reply to each patch :)
> 
> So, I like it, looks good, nice cleanup. It boots on my guest here - I
> haven't done any baremetal testing though. Let's give people some more
> time to look at it...

Same here.  I just remarked on some commit messages and comments.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] qemu/kvm: kvm guest crash event handling

2015-06-17 Thread Paolo Bonzini



On 11/06/2015 15:18, Denis V. Lunev wrote:
> From: Andrey Smetanin 
> 
> KVM Hyper-V based guests can notify hypervisor about
> occurred guest crash. This patch does handling of KVM crash event
> by sending to libvirt guest panic event that allows to gather
> guest crash dump by QEMU/LIBVIRT.
> 
> The idea is to provide functionality equal to pvpanic device without
> QEMU guest agent for Windows.
> 
> The idea is borrowed from Linux HyperV bus driver and validated against
> Windows 2k12.
> 
> Signed-off-by: Andrey Smetanin 
> Signed-off-by: Denis V. Lunev 
> CC: Gleb Natapov 
> CC: Paolo Bonzini 
> ---
>  include/sysemu/sysemu.h|  2 ++
>  kvm-all.c  |  8 
>  linux-headers/asm-x86/hyperv.h |  2 ++
>  linux-headers/linux/kvm.h  | 11 +++
>  target-i386/cpu-qom.h  |  1 +
>  target-i386/cpu.c  |  1 +
>  target-i386/kvm.c  |  4 
>  vl.c   | 31 +++
>  8 files changed, 60 insertions(+)
> 
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 853d90a..82d3213 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -61,6 +61,8 @@ void qemu_system_shutdown_request(void);
>  void qemu_system_powerdown_request(void);
>  void qemu_register_powerdown_notifier(Notifier *notifier);
>  void qemu_system_debug_request(void);
> +void qemu_system_crash_request(uint64_t p0, uint64_t p1, uint64_t p2,
> +uint64_t p3, uint64_t p4);
>  void qemu_system_vmstop_request(RunState reason);
>  void qemu_system_vmstop_request_prepare(void);
>  int qemu_shutdown_requested_get(void);
> diff --git a/kvm-all.c b/kvm-all.c
> index 53e01d4..cee23bc 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -1844,6 +1844,14 @@ int kvm_cpu_exec(CPUState *cpu)
>  qemu_system_reset_request();
>  ret = EXCP_INTERRUPT;
>  break;
> +case KVM_SYSTEM_EVENT_CRASH:
> +qemu_system_crash_request(run->system_event.crash.p0,
> +run->system_event.crash.p1,
> +run->system_event.crash.p2,
> +run->system_event.crash.p3,
> +run->system_event.crash.p4);

This needs to be synchronous, so you can do it here:

qapi_event_send_guest_panicked(GUEST_PANIC_ACTION_PAUSE, &error_abort);
vm_stop(RUN_STATE_GUEST_PANICKED);

The five values are never read back.  Please include them in the CPU
state and in the migration stream.  Migration to file is commonly used
to gather post mortem dumps, and there are even tools to convert the
migration file format to Windows crash dump format.  The tools could be
improved to find the crash data and populate the appropriate fields in
the dump file's header.

Paolo

> +ret = 0;
> +break;
>  default:
>  DPRINTF("kvm_arch_handle_exit\n");
>  ret = kvm_arch_handle_exit(cpu, run);
> diff --git a/linux-headers/asm-x86/hyperv.h b/linux-headers/asm-x86/hyperv.h
> index ce6068d..a5df1ab 100644
> --- a/linux-headers/asm-x86/hyperv.h
> +++ b/linux-headers/asm-x86/hyperv.h
> @@ -108,6 +108,8 @@
>  #define HV_X64_HYPERCALL_PARAMS_XMM_AVAILABLE(1 << 4)
>  /* Support for a virtual guest idle state is available */
>  #define HV_X64_GUEST_IDLE_STATE_AVAILABLE(1 << 5)
> +/* Guest crash data handler available */
> +#define HV_X64_GUEST_CRASH_MSR_AVAILABLE(1 << 10)
>  
>  /*
>   * Implementation recommendations. Indicates which behaviors the hypervisor
> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
> index fad9e5c..e169602 100644
> --- a/linux-headers/linux/kvm.h
> +++ b/linux-headers/linux/kvm.h
> @@ -317,8 +317,19 @@ struct kvm_run {
>   struct {
>  #define KVM_SYSTEM_EVENT_SHUTDOWN   1
>  #define KVM_SYSTEM_EVENT_RESET  2
> +#define KVM_SYSTEM_EVENT_CRASH  3
>   __u32 type;
>   __u64 flags;
> +union {
> +struct {
> +/* Guest crash related parameters */
> +__u64 p0;
> +__u64 p1;
> +__u64 p2;
> +__u64 p3;
> +__u64 p4;
> +} crash;
> +};
>   } system_event;
>   /* KVM_EXIT_S390_STSI */
>   struct {
> diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h
> index 7a4fddd..c35b624 100644
> --- a/target-i386/cpu-qom.h
> +++ b/target-i386/cpu-qom.h
> @@ -89,6 +89,7 @@ typedef struct X86CPU {
>  bool hyperv_rela

Re: [PATCH v3 06/10] arm: simplify MMIO dispatching

2015-06-17 Thread Andre Przywara

Hi Marc,

On 06/17/2015 01:48 PM, Marc Zyngier wrote:
> On 17/06/15 12:21, Andre Przywara wrote:
>> Currently we separate any incoming MMIO request into one of the ARM
>> memory map regions and take care to spare the GIC.
>> It turns out that this is unnecessary, as we only have one special
>> region (the IO port area in the first 64 KByte). The MMIO rbtree
>> takes care about unhandled MMIO ranges, so we can simply drop all the
>> special range checking (except that for the IO range) in
>> kvm_cpu__emulate_mmio().
>> As the GIC is handled in the kernel, a GIC MMIO access should never
>> reach userland (and we don't know what to do with it anyway).
>> This lets us delete some more code and simplifies future extensions
>> (like expanding the GIC regions).
>> To be in line with the other architectures, move the now simpler
>> code into a header file.
>>
>> Signed-off-by: Andre Przywara 
>> ---
>>  arm/include/arm-common/kvm-arch.h | 12 
>>  arm/include/arm-common/kvm-cpu-arch.h | 14 --
>>  arm/kvm-cpu.c | 16 
>>  3 files changed, 12 insertions(+), 30 deletions(-)
>>
>> diff --git a/arm/include/arm-common/kvm-arch.h 
>> b/arm/include/arm-common/kvm-arch.h
>> index 082131d..90d6733 100644
>> --- a/arm/include/arm-common/kvm-arch.h
>> +++ b/arm/include/arm-common/kvm-arch.h
>> @@ -45,18 +45,6 @@ static inline bool arm_addr_in_ioport_region(u64 
>> phys_addr)
>>  return phys_addr >= KVM_IOPORT_AREA && phys_addr < limit;
>>  }
>>  
>> -static inline bool arm_addr_in_virtio_mmio_region(u64 phys_addr)
>> -{
>> -u64 limit = KVM_VIRTIO_MMIO_AREA + ARM_VIRTIO_MMIO_SIZE;
>> -return phys_addr >= KVM_VIRTIO_MMIO_AREA && phys_addr < limit;
>> -}
>> -
>> -static inline bool arm_addr_in_pci_region(u64 phys_addr)
>> -{
>> -u64 limit = KVM_PCI_CFG_AREA + ARM_PCI_CFG_SIZE + ARM_PCI_MMIO_SIZE;
>> -return phys_addr >= KVM_PCI_CFG_AREA && phys_addr < limit;
>> -}
>> -
>>  struct kvm_arch {
>>  /*
>>   * We may have to align the guest memory for virtio, so keep the
>> diff --git a/arm/include/arm-common/kvm-cpu-arch.h 
>> b/arm/include/arm-common/kvm-cpu-arch.h
>> index 36c7872..329979a 100644
>> --- a/arm/include/arm-common/kvm-cpu-arch.h
>> +++ b/arm/include/arm-common/kvm-cpu-arch.h
>> @@ -44,8 +44,18 @@ static inline bool kvm_cpu__emulate_io(struct kvm_cpu 
>> *vcpu, u16 port, void *dat
>>  return false;
>>  }
>>  
>> -bool kvm_cpu__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data,
>> -   u32 len, u8 is_write);
>> +static inline bool kvm_cpu__emulate_mmio(struct kvm_cpu *vcpu, u64 
>> phys_addr,
>> + u8 *data, u32 len, u8 is_write)
>> +{
>> +if (arm_addr_in_ioport_region(phys_addr)) {
>> +int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN;
>> +u16 port = (phys_addr - KVM_IOPORT_AREA) & USHRT_MAX;
>> +
>> +return kvm__emulate_io(vcpu, port, data, direction, len, 1);
>> +}
>> +
>> +return kvm__emulate_mmio(vcpu, phys_addr, data, len, is_write);
>> +}
>>  
>>  unsigned long kvm_cpu__get_vcpu_mpidr(struct kvm_cpu *vcpu);
>>  
>> diff --git a/arm/kvm-cpu.c b/arm/kvm-cpu.c
>> index ab08815..7780251 100644
>> --- a/arm/kvm-cpu.c
>> +++ b/arm/kvm-cpu.c
>> @@ -139,22 +139,6 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu)
>>  return false;
>>  }
>>  
>> -bool kvm_cpu__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data,
>> -   u32 len, u8 is_write)
>> -{
>> -if (arm_addr_in_virtio_mmio_region(phys_addr)) {
>> -return kvm__emulate_mmio(vcpu, phys_addr, data, len, is_write);
>> -} else if (arm_addr_in_ioport_region(phys_addr)) {
>> -int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN;
>> -u16 port = (phys_addr - KVM_IOPORT_AREA) & USHRT_MAX;
>> -return kvm__emulate_io(vcpu, port, data, direction, len, 1);
>> -} else if (arm_addr_in_pci_region(phys_addr)) {
>> -return kvm__emulate_mmio(vcpu, phys_addr, data, len, is_write);
>> -}
> 
> Can you explain why this arm_addr_in_pci_region(phys_addr) check has
> disappeared in your updated version on this function? It may be a non
> issue, but I'd very much like to understand.

If you look above the calls to kvm__emulate_mmio() are exactly the same
for the PCI and the virtio_mmio region, also as the areas are
non-overlapping the if branches can be reordered.
arm_addr_in_virtio_mmio_region() is true between 64k and (1GB - GIC),
while arm_addr_in_pci_region() gives true between 1GB and 2GB.

So this translates into: do kvm__emulate_io() for anything below 64K and
kvm__emulate_mmio() for everything else except for the GIC area,
admittedly in a quite convoluted way.

So my patch just removes the check for the GIC region and rewrites it to
match that description in the last sentence, with the rationale given in
the commit message.
Does that make sense?
If you desperat

Re: [PATCH v3 06/10] arm: simplify MMIO dispatching

2015-06-17 Thread Marc Zyngier

On 17/06/15 14:49, Andre Przywara wrote:
> Hi Marc,
> 
> On 06/17/2015 01:48 PM, Marc Zyngier wrote:
>> On 17/06/15 12:21, Andre Przywara wrote:
>>> Currently we separate any incoming MMIO request into one of the ARM
>>> memory map regions and take care to spare the GIC.
>>> It turns out that this is unnecessary, as we only have one special
>>> region (the IO port area in the first 64 KByte). The MMIO rbtree
>>> takes care about unhandled MMIO ranges, so we can simply drop all the
>>> special range checking (except that for the IO range) in
>>> kvm_cpu__emulate_mmio().
>>> As the GIC is handled in the kernel, a GIC MMIO access should never
>>> reach userland (and we don't know what to do with it anyway).
>>> This lets us delete some more code and simplifies future extensions
>>> (like expanding the GIC regions).
>>> To be in line with the other architectures, move the now simpler
>>> code into a header file.
>>>
>>> Signed-off-by: Andre Przywara 
>>> ---
>>>  arm/include/arm-common/kvm-arch.h | 12 
>>>  arm/include/arm-common/kvm-cpu-arch.h | 14 --
>>>  arm/kvm-cpu.c | 16 
>>>  3 files changed, 12 insertions(+), 30 deletions(-)
>>>
>>> diff --git a/arm/include/arm-common/kvm-arch.h 
>>> b/arm/include/arm-common/kvm-arch.h
>>> index 082131d..90d6733 100644
>>> --- a/arm/include/arm-common/kvm-arch.h
>>> +++ b/arm/include/arm-common/kvm-arch.h
>>> @@ -45,18 +45,6 @@ static inline bool arm_addr_in_ioport_region(u64 
>>> phys_addr)
>>> return phys_addr >= KVM_IOPORT_AREA && phys_addr < limit;
>>>  }
>>>  
>>> -static inline bool arm_addr_in_virtio_mmio_region(u64 phys_addr)
>>> -{
>>> -   u64 limit = KVM_VIRTIO_MMIO_AREA + ARM_VIRTIO_MMIO_SIZE;
>>> -   return phys_addr >= KVM_VIRTIO_MMIO_AREA && phys_addr < limit;
>>> -}
>>> -
>>> -static inline bool arm_addr_in_pci_region(u64 phys_addr)
>>> -{
>>> -   u64 limit = KVM_PCI_CFG_AREA + ARM_PCI_CFG_SIZE + ARM_PCI_MMIO_SIZE;
>>> -   return phys_addr >= KVM_PCI_CFG_AREA && phys_addr < limit;
>>> -}
>>> -
>>>  struct kvm_arch {
>>> /*
>>>  * We may have to align the guest memory for virtio, so keep the
>>> diff --git a/arm/include/arm-common/kvm-cpu-arch.h 
>>> b/arm/include/arm-common/kvm-cpu-arch.h
>>> index 36c7872..329979a 100644
>>> --- a/arm/include/arm-common/kvm-cpu-arch.h
>>> +++ b/arm/include/arm-common/kvm-cpu-arch.h
>>> @@ -44,8 +44,18 @@ static inline bool kvm_cpu__emulate_io(struct kvm_cpu 
>>> *vcpu, u16 port, void *dat
>>> return false;
>>>  }
>>>  
>>> -bool kvm_cpu__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data,
>>> -  u32 len, u8 is_write);
>>> +static inline bool kvm_cpu__emulate_mmio(struct kvm_cpu *vcpu, u64 
>>> phys_addr,
>>> +u8 *data, u32 len, u8 is_write)
>>> +{
>>> +   if (arm_addr_in_ioport_region(phys_addr)) {
>>> +   int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN;
>>> +   u16 port = (phys_addr - KVM_IOPORT_AREA) & USHRT_MAX;
>>> +
>>> +   return kvm__emulate_io(vcpu, port, data, direction, len, 1);
>>> +   }
>>> +
>>> +   return kvm__emulate_mmio(vcpu, phys_addr, data, len, is_write);
>>> +}
>>>  
>>>  unsigned long kvm_cpu__get_vcpu_mpidr(struct kvm_cpu *vcpu);
>>>  
>>> diff --git a/arm/kvm-cpu.c b/arm/kvm-cpu.c
>>> index ab08815..7780251 100644
>>> --- a/arm/kvm-cpu.c
>>> +++ b/arm/kvm-cpu.c
>>> @@ -139,22 +139,6 @@ bool kvm_cpu__handle_exit(struct kvm_cpu *vcpu)
>>> return false;
>>>  }
>>>  
>>> -bool kvm_cpu__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data,
>>> -  u32 len, u8 is_write)
>>> -{
>>> -   if (arm_addr_in_virtio_mmio_region(phys_addr)) {
>>> -   return kvm__emulate_mmio(vcpu, phys_addr, data, len, is_write);
>>> -   } else if (arm_addr_in_ioport_region(phys_addr)) {
>>> -   int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN;
>>> -   u16 port = (phys_addr - KVM_IOPORT_AREA) & USHRT_MAX;
>>> -   return kvm__emulate_io(vcpu, port, data, direction, len, 1);
>>> -   } else if (arm_addr_in_pci_region(phys_addr)) {
>>> -   return kvm__emulate_mmio(vcpu, phys_addr, data, len, is_write);
>>> -   }
>>
>> Can you explain why this arm_addr_in_pci_region(phys_addr) check has
>> disappeared in your updated version on this function? It may be a non
>> issue, but I'd very much like to understand.
> 
> If you look above the calls to kvm__emulate_mmio() are exactly the same
> for the PCI and the virtio_mmio region, also as the areas are
> non-overlapping the if branches can be reordered.
> arm_addr_in_virtio_mmio_region() is true between 64k and (1GB - GIC),
> while arm_addr_in_pci_region() gives true between 1GB and 2GB.
> 
> So this translates into: do kvm__emulate_io() for anything below 64K and
> kvm__emulate_mmio() for everything else except for the GIC area,
> admittedly in a quite convoluted way.
> 
> So my patch just removes the check for the GIC region and rew

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Michael S. Tsirkin

On Wed, Jun 17, 2015 at 03:20:44PM +0200, Paolo Bonzini wrote:
> 
> 
> On 17/06/2015 15:13, Michael S. Tsirkin wrote:
> > > > Considering userspace can be malicious, I guess yes.
> > > I don't think it's a valid concern in this case,
> > > setting limit back from 509 to 64 will not help here in any way,
> > > userspace still can create as many vhost instances as it needs
> > > to consume memory it desires.
> > 
> > Not really since vhost char device isn't world-accessible.
> > It's typically opened by a priveledged tool, the fd is
> > then passed to an unpriveledged userspace, or permissions dropped.
> 
> Then what's the concern anyway?
> 
> Paolo

Each fd now ties up 16K of kernel memory.  It didn't use to, so
priveledged tool could safely give the unpriveledged userspace
a ton of these fds.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/10] KVM: arm/arm64: vgic: Add vgic_{get,set}_phys_irq_active

2015-06-17 Thread Eric Auger

Reviewed-by: Eric Auger 
On 06/08/2015 07:04 PM, Marc Zyngier wrote:
> In order to control the active state of an interrupt, introduce
> a pair of accessors allowing the state to be set/queried.
> 
> This only affects the logical state, and the HW state will only be
> applied at world-switch time.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  include/kvm/arm_vgic.h |  2 ++
>  virt/kvm/arm/vgic.c| 12 
>  2 files changed, 14 insertions(+)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 33d121a..1c653c1 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -349,6 +349,8 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>  int virt_irq, int irq);
>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
> +bool vgic_get_phys_irq_active(struct irq_phys_map *map);
> +void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
>  
>  #define irqchip_in_kernel(k) (!!((k)->arch.vgic.in_kernel))
>  #define vgic_initialized(k)  (!!((k)->arch.vgic.nr_cpus))
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 495ac7d..f376b56 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1744,6 +1744,18 @@ static struct irq_phys_map *vgic_irq_map_search(struct 
> kvm_vcpu *vcpu,
>   return this;
>  }
>  
> +bool vgic_get_phys_irq_active(struct irq_phys_map *map)
> +{
> + BUG_ON(!map);
> + return map->active;
> +}
> +
> +void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
> +{
> + BUG_ON(!map);
> + map->active = active;
> +}
> +
>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map)
>  {
>   struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Igor Mammedov

On Wed, 17 Jun 2015 16:32:02 +0200
"Michael S. Tsirkin"  wrote:

> On Wed, Jun 17, 2015 at 03:20:44PM +0200, Paolo Bonzini wrote:
> > 
> > 
> > On 17/06/2015 15:13, Michael S. Tsirkin wrote:
> > > > > Considering userspace can be malicious, I guess yes.
> > > > I don't think it's a valid concern in this case,
> > > > setting limit back from 509 to 64 will not help here in any way,
> > > > userspace still can create as many vhost instances as it needs
> > > > to consume memory it desires.
> > > 
> > > Not really since vhost char device isn't world-accessible.
> > > It's typically opened by a priveledged tool, the fd is
> > > then passed to an unpriveledged userspace, or permissions dropped.
> > 
> > Then what's the concern anyway?
> > 
> > Paolo
> 
> Each fd now ties up 16K of kernel memory.  It didn't use to, so
> priveledged tool could safely give the unpriveledged userspace
> a ton of these fds.
if privileged tool gives out unlimited amount of fds then it
doesn't matter whether fd ties 4K or 16K, host still could be DoSed.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts

2015-06-17 Thread Eric Auger

Hi Marc,
On 06/08/2015 07:04 PM, Marc Zyngier wrote:
> So far, the only use of the HW interrupt facility is the timer,
> implying that the active state is context-switched for each vcpu,
> as the device is is shared across all vcpus.
s/is//
> 
> This does not work for a device that has been assigned to a VM,
> as the guest is entierely in control of that device (the HW is
entirely?
> not shared). In that case, it makes sense to bypass the whole
> active state srtwitchint, and only track the deactivation of the
switching
> interrupt.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  include/kvm/arm_vgic.h|  5 +++--
>  virt/kvm/arm/arch_timer.c |  2 +-
>  virt/kvm/arm/vgic.c   | 37 -
>  3 files changed, 28 insertions(+), 16 deletions(-)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 1c653c1..5d47d60 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -164,7 +164,8 @@ struct irq_phys_map {
>   u32 virt_irq;
>   u32 phys_irq;
>   u32 irq;
> - boolactive;
> + boolshared;
> + boolactive; /* Only valid if shared */
>  };
>  
>  struct vgic_dist {
> @@ -347,7 +348,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> -int virt_irq, int irq);
> +int virt_irq, int irq, bool shared);
>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>  bool vgic_get_phys_irq_active(struct irq_phys_map *map);
>  void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index b9fff78..9544d79 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -202,7 +202,7 @@ void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>* Tell the VGIC that the virtual interrupt is tied to a
>* physical interrupt. We do that once per VCPU.
>*/
> - timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq);
> + timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq, true);
>   WARN_ON(!timer->map);
>  }
>  
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index f376b56..4223166 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1125,18 +1125,21 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu 
> *vcpu, int irq,
>   map = vgic_irq_map_search(vcpu, irq);
>  
>   if (map) {
> - int ret;
> -
> - BUG_ON(!map->active);
>   vlr.hwirq = map->phys_irq;
>   vlr.state |= LR_HW;
>   vlr.state &= ~LR_EOI_INT;
>  
> - ret = irq_set_irqchip_state(map->irq,
> - IRQCHIP_STATE_ACTIVE,
> - true);
>   vgic_irq_set_queued(vcpu, irq);
the queued state is set again in vgic_queue_hwirq for level_sensitive
IRQs although not harmful.
> - WARN_ON(ret);
> +
> + if (map->shared) {
> + int ret;
> +
> + BUG_ON(!map->active);
> + ret = irq_set_irqchip_state(map->irq,
> + 
> IRQCHIP_STATE_ACTIVE,
> + true);
> + WARN_ON(ret);
> + }
>   }
>   }
>  
> @@ -1368,21 +1371,28 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
> *vcpu)
>  static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>  {
>   struct irq_phys_map *map;
> + bool active;
>   int ret;
>  
>   if (!(vlr.state & LR_HW))
>   return 0;
>  
>   map = vgic_irq_map_search(vcpu, vlr.irq);
> - BUG_ON(!map || !map->active);
> + BUG_ON(!map);
> + BUG_ON(map->shared && !map->active);
>  
>   ret = irq_get_irqchip_state(map->irq,
>   IRQCHIP_STATE_ACTIVE,
> - &map->active);
> + &active);
>  
In case of non shared and EOIMode = 1 - I know this is not your current
interest here though ;-) - , once the guest EOIs its virtual IRQ and GIC
deactivates the physical one, a new phys IRQ can hit immediatly, the
physical handler can be entered and the state is seen as active here.
The queued state is never reset in such a case and the system gets stuck
since the can_sample fails I think. What I mean here is sounds the state
machine as is does not wo

Re: [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts

2015-06-17 Thread Marc Zyngier

On 17/06/15 16:11, Eric Auger wrote:
> Hi Marc,
> On 06/08/2015 07:04 PM, Marc Zyngier wrote:
>> So far, the only use of the HW interrupt facility is the timer,
>> implying that the active state is context-switched for each vcpu,
>> as the device is is shared across all vcpus.
> s/is//
>>
>> This does not work for a device that has been assigned to a VM,
>> as the guest is entierely in control of that device (the HW is
> entirely?
>> not shared). In that case, it makes sense to bypass the whole
>> active state srtwitchint, and only track the deactivation of the
> switching

Congratulations, I think you're now ready to try deciphering my
handwriting... ;-)

>> interrupt.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  include/kvm/arm_vgic.h|  5 +++--
>>  virt/kvm/arm/arch_timer.c |  2 +-
>>  virt/kvm/arm/vgic.c   | 37 -
>>  3 files changed, 28 insertions(+), 16 deletions(-)
>>
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index 1c653c1..5d47d60 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -164,7 +164,8 @@ struct irq_phys_map {
>>  u32 virt_irq;
>>  u32 phys_irq;
>>  u32 irq;
>> -boolactive;
>> +boolshared;
>> +boolactive; /* Only valid if shared */
>>  };
>>  
>>  struct vgic_dist {
>> @@ -347,7 +348,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 
>> reg);
>>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>> -   int virt_irq, int irq);
>> +   int virt_irq, int irq, bool shared);
>>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>  bool vgic_get_phys_irq_active(struct irq_phys_map *map);
>>  void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>> index b9fff78..9544d79 100644
>> --- a/virt/kvm/arm/arch_timer.c
>> +++ b/virt/kvm/arm/arch_timer.c
>> @@ -202,7 +202,7 @@ void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>>   * Tell the VGIC that the virtual interrupt is tied to a
>>   * physical interrupt. We do that once per VCPU.
>>   */
>> -timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq);
>> +timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq, true);
>>  WARN_ON(!timer->map);
>>  }
>>  
>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>> index f376b56..4223166 100644
>> --- a/virt/kvm/arm/vgic.c
>> +++ b/virt/kvm/arm/vgic.c
>> @@ -1125,18 +1125,21 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu 
>> *vcpu, int irq,
>>  map = vgic_irq_map_search(vcpu, irq);
>>  
>>  if (map) {
>> -int ret;
>> -
>> -BUG_ON(!map->active);
>>  vlr.hwirq = map->phys_irq;
>>  vlr.state |= LR_HW;
>>  vlr.state &= ~LR_EOI_INT;
>>  
>> -ret = irq_set_irqchip_state(map->irq,
>> -IRQCHIP_STATE_ACTIVE,
>> -true);
>>  vgic_irq_set_queued(vcpu, irq);
>
> the queued state is set again in vgic_queue_hwirq for level_sensitive
> IRQs although not harmful.

Indeed. We still need it for edge interrupts though. I'll try to find a
nicer way...

>> -WARN_ON(ret);
>> +
>> +if (map->shared) {
>> +int ret;
>> +
>> +BUG_ON(!map->active);
>> +ret = irq_set_irqchip_state(map->irq,
>> +
>> IRQCHIP_STATE_ACTIVE,
>> +true);
>> +WARN_ON(ret);
>> +}
>>  }
>>  }
>>  
>> @@ -1368,21 +1371,28 @@ static bool vgic_process_maintenance(struct kvm_vcpu 
>> *vcpu)
>>  static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>  {
>>  struct irq_phys_map *map;
>> +bool active;
>>  int ret;
>>  
>>  if (!(vlr.state & LR_HW))
>>  return 0;
>>  
>>  map = vgic_irq_map_search(vcpu, vlr.irq);
>> -BUG_ON(!map || !map->active);
>> +BUG_ON(!map);
>> +BUG_ON(map->shared && !map->active);
>>  
>>  ret = irq_get_irqchip_state(map->irq,
>>  IRQCHIP_STATE_ACTIVE,
>> -&map->active);
>> +&active);
>>  
> In case of non shared and EOIMode = 1 - I know this is not your current
> interest here though ;-) - , once the guest EOIs its virtual IRQ and GIC
> deactivates the ph

Re: [PATCH v2 11/15] KVM: MTRR: sort variable MTRRs

2015-06-17 Thread Paolo Bonzini



On 15/06/2015 10:55, Xiao Guangrong wrote:
> Sort all valid variable MTRRs based on its base address, it will help us to
> check a range to see if it's fully contained in variable MTRRs
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/include/asm/kvm_host.h |  3 ++
>  arch/x86/kvm/mtrr.c | 63 
> ++---
>  arch/x86/kvm/x86.c  |  2 +-
>  arch/x86/kvm/x86.h  |  1 +
>  4 files changed, 58 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f735548..f2d60cc 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -345,12 +345,15 @@ enum {
>  struct kvm_mtrr_range {
>   u64 base;
>   u64 mask;
> + struct list_head node;
>  };
>  
>  struct kvm_mtrr {
>   struct kvm_mtrr_range var_ranges[KVM_NR_VAR_MTRR];
>   mtrr_type fixed_ranges[KVM_NR_FIXED_MTRR_REGION];
>   u64 deftype;
> +
> + struct list_head head;
>  };
>  
>  struct kvm_vcpu_arch {
> diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
> index cb9702d..c06ec13 100644
> --- a/arch/x86/kvm/mtrr.c
> +++ b/arch/x86/kvm/mtrr.c
> @@ -281,6 +281,52 @@ static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr)
>   kvm_zap_gfn_range(vcpu->kvm, gpa_to_gfn(start), gpa_to_gfn(end));
>  }
>  
> +static bool var_mtrr_range_is_valid(struct kvm_mtrr_range *range)
> +{
> + u64 start, end;
> +
> + if (!(range->mask & (1 << 11)))
> + return false;
> +
> + var_mtrr_range(range, &start, &end);
> + return end > start;
> +}

I think this test is incorrect; it is always true unless end overflows
to zero, which cannot happen because writing an invalid value to the
MSR causes a #GP.

Paolo

> +static void set_var_mtrr_start(struct kvm_mtrr *mtrr_state, int index)
> +{
> + /* remove the entry if it's in the list. */
> + if (var_mtrr_range_is_valid(&mtrr_state->var_ranges[index]))
> + list_del(&mtrr_state->var_ranges[index].node);
> +}
> +
> +static void set_var_mtrr_end(struct kvm_mtrr *mtrr_state, int index)
> +{
> + struct kvm_mtrr_range *tmp, *cur = &mtrr_state->var_ranges[index];
> +
> + /* add it to the list if it's valid. */
> + if (var_mtrr_range_is_valid(&mtrr_state->var_ranges[index])) {
> + list_for_each_entry(tmp, &mtrr_state->head, node)
> + if (cur->base < tmp->base)
> + list_add_tail(&cur->node, &tmp->node);
> +
> + list_add_tail(&cur->node, &mtrr_state->head);
> + }
> +}
> +
> +static void set_var_mtrr_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> +{
> + int index, is_mtrr_mask;
> +
> + index = (msr - 0x200) / 2;
> + is_mtrr_mask = msr - 0x200 - 2 * index;
> + set_var_mtrr_start(&vcpu->arch.mtrr_state, index);
> + if (!is_mtrr_mask)
> + vcpu->arch.mtrr_state.var_ranges[index].base = data;
> + else
> + vcpu->arch.mtrr_state.var_ranges[index].mask = data;
> + set_var_mtrr_end(&vcpu->arch.mtrr_state, index);
> +}
> +
>  int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
>  {
>   int index;
> @@ -295,16 +341,8 @@ int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 
> data)
>   vcpu->arch.mtrr_state.deftype = data;
>   else if (msr == MSR_IA32_CR_PAT)
>   vcpu->arch.pat = data;
> - else {  /* Variable MTRRs */
> - int is_mtrr_mask;
> -
> - index = (msr - 0x200) / 2;
> - is_mtrr_mask = msr - 0x200 - 2 * index;
> - if (!is_mtrr_mask)
> - vcpu->arch.mtrr_state.var_ranges[index].base = data;
> - else
> - vcpu->arch.mtrr_state.var_ranges[index].mask = data;
> - }
> + else
> + set_var_mtrr_msr(vcpu, msr, data);
>  
>   update_mtrr(vcpu, msr);
>   return 0;
> @@ -350,6 +388,11 @@ int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 
> *pdata)
>   return 0;
>  }
>  
> +void kvm_vcpu_mtrr_init(struct kvm_vcpu *vcpu)
> +{
> + INIT_LIST_HEAD(&vcpu->arch.mtrr_state.head);
> +}
> +
>  u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn)
>  {
>   struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2ffad7f..6574fa3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7379,13 +7379,13 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
>  {
>   int r;
>  
> + kvm_vcpu_mtrr_init(vcpu);
>   r = vcpu_load(vcpu);
>   if (r)
>   return r;
>   kvm_vcpu_reset(vcpu, false);
>   kvm_mmu_setup(vcpu);
>   vcpu_put(vcpu);
> -
>   return r;
>  }
>  
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index aeb0bb2..0e4727c 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -162,6 +162,7 @@ int kvm_write_guest_virt_system(struct x86_emulate_ctxt 
> *ctxt,
>

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Michael S. Tsirkin

On Wed, Jun 17, 2015 at 05:12:57PM +0200, Igor Mammedov wrote:
> On Wed, 17 Jun 2015 16:32:02 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > On Wed, Jun 17, 2015 at 03:20:44PM +0200, Paolo Bonzini wrote:
> > > 
> > > 
> > > On 17/06/2015 15:13, Michael S. Tsirkin wrote:
> > > > > > Considering userspace can be malicious, I guess yes.
> > > > > I don't think it's a valid concern in this case,
> > > > > setting limit back from 509 to 64 will not help here in any way,
> > > > > userspace still can create as many vhost instances as it needs
> > > > > to consume memory it desires.
> > > > 
> > > > Not really since vhost char device isn't world-accessible.
> > > > It's typically opened by a priveledged tool, the fd is
> > > > then passed to an unpriveledged userspace, or permissions dropped.
> > > 
> > > Then what's the concern anyway?
> > > 
> > > Paolo
> > 
> > Each fd now ties up 16K of kernel memory.  It didn't use to, so
> > priveledged tool could safely give the unpriveledged userspace
> > a ton of these fds.
> if privileged tool gives out unlimited amount of fds then it
> doesn't matter whether fd ties 4K or 16K, host still could be DoSed.
> 

Of course it does not give out unlimited fds, there's a way
for the sysadmin to specify the number of fds. Look at how libvirt
uses vhost, it should become clear I think.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 10/15] KVM: MTRR: introduce var_mtrr_range

2015-06-17 Thread Paolo Bonzini



On 15/06/2015 10:55, Xiao Guangrong wrote:
> It gets the range for the specified variable MTRR
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/kvm/mtrr.c | 19 +--
>  1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
> index df73149..cb9702d 100644
> --- a/arch/x86/kvm/mtrr.c
> +++ b/arch/x86/kvm/mtrr.c
> @@ -241,10 +241,21 @@ static int fixed_msr_to_range_index(u32 msr)
>   return fixed_mtrr_seg_unit_range_index(seg, unit);
>  }
>  
> +static void var_mtrr_range(struct kvm_mtrr_range *range, u64 *start, u64 
> *end)
> +{
> + u64 mask;
> +
> + *start = range->base & PAGE_MASK;
> +
> + mask = range->mask & PAGE_MASK;
> + mask |= ~0ULL << boot_cpu_data.x86_phys_bits;
> + *end = ((*start & mask) | ~mask) + 1;

This is just (*start | ~mask) + 1.  I will adjust this.

Paolo

> +}
> +
>  static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr)
>  {
>   struct kvm_mtrr *mtrr_state = &vcpu->arch.mtrr_state;
> - gfn_t start, end, mask;
> + gfn_t start, end;
>   int index;
>  
>   if (msr == MSR_IA32_CR_PAT || !tdp_enabled ||
> @@ -264,11 +275,7 @@ static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr)
>   } else {
>   /* variable range MTRRs. */
>   index = (msr - 0x200) / 2;
> - start = mtrr_state->var_ranges[index].base & PAGE_MASK;
> - mask = mtrr_state->var_ranges[index].mask & PAGE_MASK;
> - mask |= ~0ULL << cpuid_maxphyaddr(vcpu);
> -
> - end = ((start & mask) | ~mask) + 1;
> + var_mtrr_range(&mtrr_state->var_ranges[index], &start, &end);
>   }
>  
>   kvm_zap_gfn_range(vcpu->kvm, gpa_to_gfn(start), gpa_to_gfn(end));
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/10] KVM: arm/arm64: vgic: Allow non-shared device HW interrupts

2015-06-17 Thread Eric Auger

On 06/17/2015 05:37 PM, Marc Zyngier wrote:
> On 17/06/15 16:11, Eric Auger wrote:
>> Hi Marc,
>> On 06/08/2015 07:04 PM, Marc Zyngier wrote:
>>> So far, the only use of the HW interrupt facility is the timer,
>>> implying that the active state is context-switched for each vcpu,
>>> as the device is is shared across all vcpus.
>> s/is//
>>>
>>> This does not work for a device that has been assigned to a VM,
>>> as the guest is entierely in control of that device (the HW is
>> entirely?
>>> not shared). In that case, it makes sense to bypass the whole
>>> active state srtwitchint, and only track the deactivation of the
>> switching
> 
> Congratulations, I think you're now ready to try deciphering my
> handwriting... ;-)
good to see you're not a machine or maybe you do it on purpose some
times ;-)
> 
>>> interrupt.
>>>
>>> Signed-off-by: Marc Zyngier 
>>> ---
>>>  include/kvm/arm_vgic.h|  5 +++--
>>>  virt/kvm/arm/arch_timer.c |  2 +-
>>>  virt/kvm/arm/vgic.c   | 37 -
>>>  3 files changed, 28 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>>> index 1c653c1..5d47d60 100644
>>> --- a/include/kvm/arm_vgic.h
>>> +++ b/include/kvm/arm_vgic.h
>>> @@ -164,7 +164,8 @@ struct irq_phys_map {
>>> u32 virt_irq;
>>> u32 phys_irq;
>>> u32 irq;
>>> -   boolactive;
>>> +   boolshared;
>>> +   boolactive; /* Only valid if shared */
>>>  };
>>>  
>>>  struct vgic_dist {
>>> @@ -347,7 +348,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 
>>> reg);
>>>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>>  int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>>>  struct irq_phys_map *vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>>> -  int virt_irq, int irq);
>>> +  int virt_irq, int irq, bool shared);
>>>  int vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>>  bool vgic_get_phys_irq_active(struct irq_phys_map *map);
>>>  void vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>>> index b9fff78..9544d79 100644
>>> --- a/virt/kvm/arm/arch_timer.c
>>> +++ b/virt/kvm/arm/arch_timer.c
>>> @@ -202,7 +202,7 @@ void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>>>  * Tell the VGIC that the virtual interrupt is tied to a
>>>  * physical interrupt. We do that once per VCPU.
>>>  */
>>> -   timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq);
>>> +   timer->map = vgic_map_phys_irq(vcpu, irq->irq, host_vtimer_irq, true);
>>> WARN_ON(!timer->map);
>>>  }
>>>  
>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>> index f376b56..4223166 100644
>>> --- a/virt/kvm/arm/vgic.c
>>> +++ b/virt/kvm/arm/vgic.c
>>> @@ -1125,18 +1125,21 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu 
>>> *vcpu, int irq,
>>> map = vgic_irq_map_search(vcpu, irq);
>>>  
>>> if (map) {
>>> -   int ret;
>>> -
>>> -   BUG_ON(!map->active);
>>> vlr.hwirq = map->phys_irq;
>>> vlr.state |= LR_HW;
>>> vlr.state &= ~LR_EOI_INT;
>>>  
>>> -   ret = irq_set_irqchip_state(map->irq,
>>> -   IRQCHIP_STATE_ACTIVE,
>>> -   true);
>>> vgic_irq_set_queued(vcpu, irq);
>>
>> the queued state is set again in vgic_queue_hwirq for level_sensitive
>> IRQs although not harmful.
> 
> Indeed. We still need it for edge interrupts though. I'll try to find a
> nicer way...
> 
>>> -   WARN_ON(ret);
>>> +
>>> +   if (map->shared) {
>>> +   int ret;
>>> +
>>> +   BUG_ON(!map->active);
>>> +   ret = irq_set_irqchip_state(map->irq,
>>> +   
>>> IRQCHIP_STATE_ACTIVE,
>>> +   true);
>>> +   WARN_ON(ret);
>>> +   }
>>> }
>>> }
>>>  
>>> @@ -1368,21 +1371,28 @@ static bool vgic_process_maintenance(struct 
>>> kvm_vcpu *vcpu)
>>>  static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>>  {
>>> struct irq_phys_map *map;
>>> +   bool active;
>>> int ret;
>>>  
>>> if (!(vlr.state & LR_HW))
>>> return 0;
>>>  
>>> map = vgic_irq_map_search(vcpu, vlr.irq);
>>> -   BUG_ON(!map || !map->active);
>>> +   BUG_ON(!map);
>>> +   BUG_ON(map->shared && !map->active);
>>>  
>>> ret = irq_get_irqchip_state(map->irq,
>>> IRQCHIP_STATE_ACTIVE,
>>> -   &map->active);
>>> +

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Igor Mammedov

On Wed, 17 Jun 2015 17:38:40 +0200
"Michael S. Tsirkin"  wrote:

> On Wed, Jun 17, 2015 at 05:12:57PM +0200, Igor Mammedov wrote:
> > On Wed, 17 Jun 2015 16:32:02 +0200
> > "Michael S. Tsirkin"  wrote:
> > 
> > > On Wed, Jun 17, 2015 at 03:20:44PM +0200, Paolo Bonzini wrote:
> > > > 
> > > > 
> > > > On 17/06/2015 15:13, Michael S. Tsirkin wrote:
> > > > > > > Considering userspace can be malicious, I guess yes.
> > > > > > I don't think it's a valid concern in this case,
> > > > > > setting limit back from 509 to 64 will not help here in any
> > > > > > way, userspace still can create as many vhost instances as
> > > > > > it needs to consume memory it desires.
> > > > > 
> > > > > Not really since vhost char device isn't world-accessible.
> > > > > It's typically opened by a priveledged tool, the fd is
> > > > > then passed to an unpriveledged userspace, or permissions
> > > > > dropped.
> > > > 
> > > > Then what's the concern anyway?
> > > > 
> > > > Paolo
> > > 
> > > Each fd now ties up 16K of kernel memory.  It didn't use to, so
> > > priveledged tool could safely give the unpriveledged userspace
> > > a ton of these fds.
> > if privileged tool gives out unlimited amount of fds then it
> > doesn't matter whether fd ties 4K or 16K, host still could be DoSed.
> > 
> 
> Of course it does not give out unlimited fds, there's a way
> for the sysadmin to specify the number of fds. Look at how libvirt
> uses vhost, it should become clear I think.
then it just means that tool has to take into account a new limits
to partition host in sensible manner.
Exposing limit as module parameter might be of help to tool for
getting/setting it in a way it needs.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 11/15] KVM: MTRR: sort variable MTRRs

2015-06-17 Thread Paolo Bonzini



On 15/06/2015 10:55, Xiao Guangrong wrote:
> + /* add it to the list if it's valid. */
> + if (var_mtrr_range_is_valid(&mtrr_state->var_ranges[index])) {
> + list_for_each_entry(tmp, &mtrr_state->head, node)
> + if (cur->base < tmp->base)
> + list_add_tail(&cur->node, &tmp->node);
> + list_add_tail(&cur->node, &mtrr_state->head);

Also, this loop looks weird.  Is this what you wanted?

list_for_each_entry(tmp, &mtrr_state->head, node)
if (cur->base >= tmp->base)
break;
list_add_tail(&cur->node, &tmp->node);

If so, can you look at kvm/queue and see if it is okay for you (so that
we can get the series in 4.2)?

Paolo

> + }
> +}
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Michael S. Tsirkin

On Wed, Jun 17, 2015 at 06:09:21PM +0200, Igor Mammedov wrote:
> On Wed, 17 Jun 2015 17:38:40 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > On Wed, Jun 17, 2015 at 05:12:57PM +0200, Igor Mammedov wrote:
> > > On Wed, 17 Jun 2015 16:32:02 +0200
> > > "Michael S. Tsirkin"  wrote:
> > > 
> > > > On Wed, Jun 17, 2015 at 03:20:44PM +0200, Paolo Bonzini wrote:
> > > > > 
> > > > > 
> > > > > On 17/06/2015 15:13, Michael S. Tsirkin wrote:
> > > > > > > > Considering userspace can be malicious, I guess yes.
> > > > > > > I don't think it's a valid concern in this case,
> > > > > > > setting limit back from 509 to 64 will not help here in any
> > > > > > > way, userspace still can create as many vhost instances as
> > > > > > > it needs to consume memory it desires.
> > > > > > 
> > > > > > Not really since vhost char device isn't world-accessible.
> > > > > > It's typically opened by a priveledged tool, the fd is
> > > > > > then passed to an unpriveledged userspace, or permissions
> > > > > > dropped.
> > > > > 
> > > > > Then what's the concern anyway?
> > > > > 
> > > > > Paolo
> > > > 
> > > > Each fd now ties up 16K of kernel memory.  It didn't use to, so
> > > > priveledged tool could safely give the unpriveledged userspace
> > > > a ton of these fds.
> > > if privileged tool gives out unlimited amount of fds then it
> > > doesn't matter whether fd ties 4K or 16K, host still could be DoSed.
> > > 
> > 
> > Of course it does not give out unlimited fds, there's a way
> > for the sysadmin to specify the number of fds. Look at how libvirt
> > uses vhost, it should become clear I think.
> then it just means that tool has to take into account a new limits
> to partition host in sensible manner.

Meanwhile old tools are vulnerable to OOM attacks.

> Exposing limit as module parameter might be of help to tool for
> getting/setting it in a way it needs.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Paolo Bonzini



On 17/06/2015 18:30, Michael S. Tsirkin wrote:
> Meanwhile old tools are vulnerable to OOM attacks.

For each vhost device there will be likely one tap interface, and I
suspect that it takes way, way more than 16KB of memory.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Michael S. Tsirkin

On Wed, Jun 17, 2015 at 06:31:32PM +0200, Paolo Bonzini wrote:
> 
> 
> On 17/06/2015 18:30, Michael S. Tsirkin wrote:
> > Meanwhile old tools are vulnerable to OOM attacks.
> 
> For each vhost device there will be likely one tap interface, and I
> suspect that it takes way, way more than 16KB of memory.
> 
> Paolo

That's not true. We have a vhost device per queue, all queues
are part of a single tap device.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Paolo Bonzini



On 17/06/2015 18:34, Michael S. Tsirkin wrote:
> On Wed, Jun 17, 2015 at 06:31:32PM +0200, Paolo Bonzini wrote:
>>
>>
>> On 17/06/2015 18:30, Michael S. Tsirkin wrote:
>>> Meanwhile old tools are vulnerable to OOM attacks.
>>
>> For each vhost device there will be likely one tap interface, and I
>> suspect that it takes way, way more than 16KB of memory.
> 
> That's not true. We have a vhost device per queue, all queues
> are part of a single tap device.

s/tap/VCPU/ then.  A KVM VCPU also takes more than 16KB of memory.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Michael S. Tsirkin

On Wed, Jun 17, 2015 at 06:38:25PM +0200, Paolo Bonzini wrote:
> 
> 
> On 17/06/2015 18:34, Michael S. Tsirkin wrote:
> > On Wed, Jun 17, 2015 at 06:31:32PM +0200, Paolo Bonzini wrote:
> >>
> >>
> >> On 17/06/2015 18:30, Michael S. Tsirkin wrote:
> >>> Meanwhile old tools are vulnerable to OOM attacks.
> >>
> >> For each vhost device there will be likely one tap interface, and I
> >> suspect that it takes way, way more than 16KB of memory.
> > 
> > That's not true. We have a vhost device per queue, all queues
> > are part of a single tap device.
> 
> s/tap/VCPU/ then.  A KVM VCPU also takes more than 16KB of memory.
> 
> Paolo

That's up to you as a kvm maintainer :)
People are already concerned about vhost device
memory usage, I'm not happy to define our user/kernel interface
in a way that forces even more memory to be used up.


-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Paolo Bonzini

On 17/06/2015 18:41, Michael S. Tsirkin wrote:
> On Wed, Jun 17, 2015 at 06:38:25PM +0200, Paolo Bonzini wrote:
>>
>>
>> On 17/06/2015 18:34, Michael S. Tsirkin wrote:
>>> On Wed, Jun 17, 2015 at 06:31:32PM +0200, Paolo Bonzini wrote:

 On 17/06/2015 18:30, Michael S. Tsirkin wrote:
> Meanwhile old tools are vulnerable to OOM attacks.

 For each vhost device there will be likely one tap interface, and I
 suspect that it takes way, way more than 16KB of memory.
>>>
>>> That's not true. We have a vhost device per queue, all queues
>>> are part of a single tap device.
>>
>> s/tap/VCPU/ then.  A KVM VCPU also takes more than 16KB of memory.
> 
> That's up to you as a kvm maintainer :)

Not easy, when the CPU alone requires three (albeit non-consecutive)
pages for the VMCS, the APIC access page and the EPT root.

> People are already concerned about vhost device
> memory usage, I'm not happy to define our user/kernel interface
> in a way that forces even more memory to be used up.

So, the questions to ask are:

1) What is the memory usage like immediately after vhost is brought up,
apart from these 16K?

2) Is there anything in vhost that allocates a user-controllable amount
of memory?

3) What is the size of the data structures that support one virtqueue
(there are two of them)?  Does it depend on the size of the virtqueues?

4) Would it make sense to share memory regions between multiple vhost
devices?  Would it be hard to implement?  It would also make memory
operations O(1) rather than O(#cpus).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[kvm:queue 94/94] mtrr.c:undefined reference to `__udivdi3'

2015-06-17 Thread kbuild test robot

tree:   git://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
head:   3b1a15b8db95eff1bcd9303057c9415f650c6331
commit: 3b1a15b8db95eff1bcd9303057c9415f650c6331 [94/94] KVM: MTRR: do not map 
huge page for non-consistent range
config: i386-randconfig-i0-201524 (attached as .config)
reproduce:
  git checkout 3b1a15b8db95eff1bcd9303057c9415f650c6331
  # save the attached .config to linux build tree
  make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   arch/x86/built-in.o: In function `mtrr_lookup_fixed_next':
>> mtrr.c:(.text+0x35282): undefined reference to `__udivdi3'
   arch/x86/built-in.o: In function `mtrr_lookup_start.constprop.1':
   mtrr.c:(.text+0x35529): undefined reference to `__udivdi3'

---
0-DAY kernel test infrastructureOpen Source Technology Center
http://lists.01.org/mailman/listinfo/kbuild Intel Corporation
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.1.0-rc2 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=3
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_CROSS_MEMORY_ATTACH is not set
CONFIG_FHANDLE=y
CONFIG_USELIB=y
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_LEGACY_ALLOC_HWIRQ=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_CHIP=y
CONFIG_IRQ_DOMAIN=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
# CONFIG_TASK_XACCT is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
CONFIG_SRCU=y
CONFIG_TASKS_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_FANOUT=32
CONFIG_RCU_FANOUT_LEAF=16
CONFIG_RCU_FANOUT_EXACT=y
# CONFIG_RCU_FAST_NO_HZ is not set
CONFIG_TREE_RCU_TRACE=y
CONFIG_RCU_KTHREAD_PRIO=0
# CONFIG_RCU_NOCB_CPU is not set
# CONFIG_RCU_EXPEDITE_BOOT is not set
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
# CONFIG_IKCONFIG_PROC is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_CGROUPS=y
CONFIG_CGROUP_DEBUG=y
# CONFIG_CGROUP_FREEZER is not set
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
# CONFIG_CGROUP_CPUACCT is not set
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
# CONFIG_MEMCG_SWAP_ENABLED is not set
# CONFIG_MEMCG_KMEM is not set
# CONFIG_CGROUP_PERF is not set
# CONFIG_CGROUP_SCHED is not set
# CONFIG_BLK_CGROUP is not set
CONFIG_CHECKPOINT_RESTORE=y
# CONFIG_NAMESPACES is not set
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZM

Re: [PATCH v2] arm: KVM: force execution of HCPTR access on VM exit

2015-06-17 Thread Vikram Sethi

Hi Marc, this version of the patch works for me.
Tested-by: Vikram Sethi 

Thanks,
Vikram
On 06/17/15 04:27, Marc Zyngier wrote:
> On VM entry, we disable access to the VFP registers in order to
> perform a lazy save/restore of these registers.
>
> On VM exit, we restore access, test if we did enable them before,
> and save/restore the guest/host registers if necessary. In this
> sequence, the FPEXC register is always accessed, irrespective
> of the trapping configuration.
>
> If the guest didn't touch the VFP registers, then the HCPTR access
> has now enabled such access, but we're missing a barrier to ensure
> architectural execution of the new HCPTR configuration. If the HCPTR
> access has been delayed/reordered, the subsequent access to FPEXC
> will cause a trap, which we aren't prepared to handle at all.
>
> The same condition exists when trapping to enable VFP for the guest.
>
> The fix is to introduce a barrier after enabling VFP access. In the
> vmexit case, it can be relaxed to only takes place if the guest hasn't
> accessed its view of the VFP registers, making the access to FPEXC safe.
>
> The set_hcptr macro is modified to deal with both vmenter/vmexit and
> vmtrap operations, and now takes an optional label that is branched to
> when the guest hasn't touched the VFP registers.
>
> Reported-by: Vikram Sethi 
> Cc: sta...@kernel.org # v3.9+
> Signed-off-by: Marc Zyngier 
> ---
> * From v1:
>   - Changed from a discrete fix to be integrated in set_hcptr
>   - Also introduce an ISB on vmtrap (reported by Vikram)
>   - Dropped Christoffer Reviewed-by, due to significant changes
>
>  arch/arm/kvm/interrupts.S  | 10 --
>  arch/arm/kvm/interrupts_head.S | 20 ++--
>  2 files changed, 22 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index 79caf79..f7db3a5 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -170,13 +170,9 @@ __kvm_vcpu_return:
>   @ Don't trap coprocessor accesses for host kernel
>   set_hstr vmexit
>   set_hdcr vmexit
> - set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
> + set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)), 
> after_vfp_restore
>  
>  #ifdef CONFIG_VFPv3
> - @ Save floating point registers we if let guest use them.
> - tst r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
> - bne after_vfp_restore
> -
>   @ Switch VFP/NEON hardware state to the host's
>   add r7, vcpu, #VCPU_VFP_GUEST
>   store_vfp_state r7
> @@ -188,6 +184,8 @@ after_vfp_restore:
>   @ Restore FPEXC_EN which we clobbered on entry
>   pop {r2}
>   VFPFMXR FPEXC, r2
> +#else
> +after_vfp_restore:
>  #endif
>  
>   @ Reset Hyp-role
> @@ -483,7 +481,7 @@ switch_to_guest_vfp:
>   push{r3-r7}
>  
>   @ NEON/VFP used.  Turn on VFP access.
> - set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11))
> + set_hcptr vmtrap, (HCPTR_TCP(10) | HCPTR_TCP(11))
>  
>   @ Switch VFP/NEON hardware state to the guest's
>   add r7, r0, #VCPU_VFP_HOST
> diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
> index 35e4a3a..48efe2e 100644
> --- a/arch/arm/kvm/interrupts_head.S
> +++ b/arch/arm/kvm/interrupts_head.S
> @@ -591,8 +591,13 @@ ARM_BE8(rev  r6, r6  )
>  .endm
>  
>  /* Configures the HCPTR (Hyp Coprocessor Trap Register) on entry/return
> - * (hardware reset value is 0). Keep previous value in r2. */
> -.macro set_hcptr operation, mask
> + * (hardware reset value is 0). Keep previous value in r2.
> + * An ISB is emited on vmexit/vmtrap, but executed on vmexit only if
> + * VFP wasn't already enabled (always executed on vmtrap).
> + * If a label is specified with vmexit, it is branched to if VFP wasn't
> + * enabled.
> + */
> +.macro set_hcptr operation, mask, label = none
>   mrc p15, 4, r2, c1, c1, 2
>   ldr r3, =\mask
>   .if \operation == vmentry
> @@ -601,6 +606,17 @@ ARM_BE8(rev  r6, r6  )
>   bic r3, r2, r3  @ Don't trap defined coproc-accesses
>   .endif
>   mcr p15, 4, r3, c1, c1, 2
> + .if \operation != vmentry
> + .if \operation == vmexit
> + tst r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
> + beq 1f
> + .endif
> + isb
> + .if \label != none
> + b   \label
> + .endif
> +1:
> + .endif
>  .endm
>  
>  /* Configures the HDCR (Hyp Debug Configuration Register) on entry/return


-- 
Vikram Sethi
Qualcomm Technologies Inc, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [kvm:queue 94/94] mtrr.c:undefined reference to `__udivdi3'

2015-06-17 Thread Paolo Bonzini



On 17/06/2015 19:02, kbuild test robot wrote:
> tree:   git://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
> head:   3b1a15b8db95eff1bcd9303057c9415f650c6331
> commit: 3b1a15b8db95eff1bcd9303057c9415f650c6331 [94/94] KVM: MTRR: do not 
> map huge page for non-consistent range
> config: i386-randconfig-i0-201524 (attached as .config)
> reproduce:
>   git checkout 3b1a15b8db95eff1bcd9303057c9415f650c6331
>   # save the attached .config to linux build tree
>   make ARCH=i386 
> 
> All error/warnings (new ones prefixed by >>):
> 
>arch/x86/built-in.o: In function `mtrr_lookup_fixed_next':
>>> mtrr.c:(.text+0x35282): undefined reference to `__udivdi3'
>arch/x86/built-in.o: In function `mtrr_lookup_start.constprop.1':
>mtrr.c:(.text+0x35529): undefined reference to `__udivdi3'

Xiao, I fixed this by changing range_size to a shift count.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] KVM: PPC: Book3S HV: Dynamic micro-threading/split-core

2015-06-17 Thread Laurent Vivier

[I resend my message because MLs have refused the first one in HTML]

On 28/05/2015 07:17, Paul Mackerras wrote:
> This patch series provides a way to use more of the capacity of each
> processor core when running guests configured with threads=1, 2 or 4
> on a POWER8 host with HV KVM, without having to change the static
> micro-threading (the official name for split-core) mode for the whole
> machine.  The problem with setting the machine to static 2-way or
> 4-way micro-threading mode is that (a) then you can't run guests with
> threads=8 and (b) selecting the right mode can be tricky and requires
> knowledge of what guests you will be running.
>
> Instead, with these two patches, we can now run more than one virtual
> core (vcore) on a given physical core if possible, and if that means
> we need to switch the core to 2-way or 4-way micro-threading mode,
> then we do that on entry to the guests and switch back to whole-core
> mode on exit (and we only switch the one core, not the whole machine).
> The core mode switching is only done if the machine is in static
> whole-core mode.
>
> All of this only comes into effect when a core is over-committed.
> When the machine is lightly loaded everything operates the same with
> these patches as without.  Only when some core has a vcore that is
> able to run while there is also another vcore that was wanting to run
> on that core but got preempted does the logic kick in to try to run
> both vcores at once.
>
> Paul.
> ---
>
>  arch/powerpc/include/asm/kvm_book3s_asm.h |  20 +
>  arch/powerpc/include/asm/kvm_host.h   |  22 +-
>  arch/powerpc/kernel/asm-offsets.c |   9 +
>  arch/powerpc/kvm/book3s_hv.c  | 648 
> ++
>  arch/powerpc/kvm/book3s_hv_builtin.c  |  32 +-
>  arch/powerpc/kvm/book3s_hv_rm_xics.c  |   4 +-
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 111 -
>  7 files changed, 740 insertions(+), 106 deletions(-)

Tested-by: Laurent Vivier 

Performance is better, but Paul could you explain why it is better if I disable 
dynamic micro-threading ?
Did I miss something ?

My test system is an IBM Power S822L.

I run two guests with 8 vCPUs (-smp 8,sockets=8,cores=1,threads=1) both
attached on the same core (with pinning option of virt-manager). Then, I
measure the time needed to compile a kernel in parallel in both guests
with "make -j 16".

My kernel without micro-threading:

real37m23.424s real37m24.959s
user167m31.474suser165m44.142s
sys 113m26.195ssys 113m45.072s

With micro-threading patches (PATCH 1+2):

target_smt_mode 0 [in fact It was 8 here, but it should behave like 0, as it is 
> max threads/sub-core]
dynamic_mt_modes 6

real32m13.338s real  32m26.652s
user139m21.181suser  140m20.994s
sys 77m35.339s sys   78m16.599s

It's better, but if I disable dynamic micro-threading (but PATCH 1+2):

target_smt_mode 0
dynamic_mt_modes 0

real30m49.100s real 30m48.161s
user144m22.989suser 142m53.886s
sys 65m4.942s  sys  66m8.159s

it's even better.

without dynamic micro-threading patch (with PATCH1 but not PATCH2):

target_smt_mode 0

real33m57.279s real 34m19.524s
user158m43.064suser 156m19.863s
sys 74m25.442s sys  76m42.994s


Laurent

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-17 Thread Igor Mammedov

On Wed, 17 Jun 2015 18:30:02 +0200
"Michael S. Tsirkin"  wrote:

> On Wed, Jun 17, 2015 at 06:09:21PM +0200, Igor Mammedov wrote:
> > On Wed, 17 Jun 2015 17:38:40 +0200
> > "Michael S. Tsirkin"  wrote:
> > 
> > > On Wed, Jun 17, 2015 at 05:12:57PM +0200, Igor Mammedov wrote:
> > > > On Wed, 17 Jun 2015 16:32:02 +0200
> > > > "Michael S. Tsirkin"  wrote:
> > > > 
> > > > > On Wed, Jun 17, 2015 at 03:20:44PM +0200, Paolo Bonzini wrote:
> > > > > > 
> > > > > > 
> > > > > > On 17/06/2015 15:13, Michael S. Tsirkin wrote:
> > > > > > > > > Considering userspace can be malicious, I guess yes.
> > > > > > > > I don't think it's a valid concern in this case,
> > > > > > > > setting limit back from 509 to 64 will not help here in
> > > > > > > > any way, userspace still can create as many vhost
> > > > > > > > instances as it needs to consume memory it desires.
> > > > > > > 
> > > > > > > Not really since vhost char device isn't world-accessible.
> > > > > > > It's typically opened by a priveledged tool, the fd is
> > > > > > > then passed to an unpriveledged userspace, or permissions
> > > > > > > dropped.
> > > > > > 
> > > > > > Then what's the concern anyway?
> > > > > > 
> > > > > > Paolo
> > > > > 
> > > > > Each fd now ties up 16K of kernel memory.  It didn't use to,
> > > > > so priveledged tool could safely give the unpriveledged
> > > > > userspace a ton of these fds.
> > > > if privileged tool gives out unlimited amount of fds then it
> > > > doesn't matter whether fd ties 4K or 16K, host still could be
> > > > DoSed.
> > > > 
> > > 
> > > Of course it does not give out unlimited fds, there's a way
> > > for the sysadmin to specify the number of fds. Look at how libvirt
> > > uses vhost, it should become clear I think.
> > then it just means that tool has to take into account a new limits
> > to partition host in sensible manner.
> 
> Meanwhile old tools are vulnerable to OOM attacks.
Let's leave old limit by default and allow override it via module
parameter, that way tools old tools won't be affected and new tools
could set limit the way they need.
That will accommodate current slot hungry userspace and a new one with
continuous HVA and won't regress old tools.

> 
> > Exposing limit as module parameter might be of help to tool for
> > getting/setting it in a way it needs.
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 108 matches

Mail list logo