Re: [PATCH RFC] vhost: add ioctl to query nregions upper limit

2015-06-25 Thread Igor Mammedov
On Wed, 24 Jun 2015 17:08:56 +0200
"Michael S. Tsirkin"  wrote:

> On Wed, Jun 24, 2015 at 04:52:29PM +0200, Igor Mammedov wrote:
> > On Wed, 24 Jun 2015 16:17:46 +0200
> > "Michael S. Tsirkin"  wrote:
> > 
> > > On Wed, Jun 24, 2015 at 04:07:27PM +0200, Igor Mammedov wrote:
> > > > On Wed, 24 Jun 2015 15:49:27 +0200
> > > > "Michael S. Tsirkin"  wrote:
> > > > 
> > > > > Userspace currently simply tries to give vhost as many regions
> > > > > as it happens to have, but you only have the mem table
> > > > > when you have initialized a large part of VM, so graceful
> > > > > failure is very hard to support.
> > > > > 
> > > > > The result is that userspace tends to fail catastrophically.
> > > > > 
> > > > > Instead, add a new ioctl so userspace can find out how much
> > > > > kernel supports, up front. This returns a positive value that
> > > > > we commit to.
> > > > > 
> > > > > Also, document our contract with legacy userspace: when
> > > > > running on an old kernel, you get -1 and you can assume at
> > > > > least 64 slots.  Since 0 value's left unused, let's make that
> > > > > mean that the current userspace behaviour (trial and error)
> > > > > is required, just in case we want it back.
> > > > > 
> > > > > Signed-off-by: Michael S. Tsirkin 
> > > > > Cc: Igor Mammedov 
> > > > > Cc: Paolo Bonzini 
> > > > > ---
> > > > >  include/uapi/linux/vhost.h | 17 -
> > > > >  drivers/vhost/vhost.c  |  5 +
> > > > >  2 files changed, 21 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/include/uapi/linux/vhost.h
> > > > > b/include/uapi/linux/vhost.h index ab373191..f71fa6d 100644
> > > > > --- a/include/uapi/linux/vhost.h
> > > > > +++ b/include/uapi/linux/vhost.h
> > > > > @@ -80,7 +80,7 @@ struct vhost_memory {
> > > > >   * Allows subsequent call to VHOST_OWNER_SET to succeed. */
> > > > >  #define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02)
> > > > >  
> > > > > -/* Set up/modify memory layout */
> > > > > +/* Set up/modify memory layout: see also
> > > > > VHOST_GET_MEM_MAX_NREGIONS below. */ #define
> > > > > VHOST_SET_MEM_TABLE   _IOW(VHOST_VIRTIO, 0x03, struct
> > > > > vhost_memory) /* Write logging setup. */
> > > > > @@ -127,6 +127,21 @@ struct vhost_memory {
> > > > >  /* Set eventfd to signal an error */
> > > > >  #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct
> > > > > vhost_vring_file) 
> > > > > +/* Query upper limit on nregions in VHOST_SET_MEM_TABLE
> > > > > arguments.
> > > > > + * Returns:
> > > > > + *   0 < value <= MAX_INT - gives the upper limit,
> > > > > higher values will fail
> > > > > + *   0 - there's no static limit: try and see if it
> > > > > works
> > > > > + *   -1 - on failure
> > > > > + */
> > > > > +#define VHOST_GET_MEM_MAX_NREGIONS   _IO(VHOST_VIRTIO, 0x23)
> > > > > +
> > > > > +/* Returned by VHOST_GET_MEM_MAX_NREGIONS to mean there's no
> > > > > static limit:
> > > > > + * try and it'll work if you are lucky. */
> > > > > +#define VHOST_MEM_MAX_NREGIONS_NONE 0
> > > > is it needed? we always have a limit,
> > > > or don't have IOCTL => -1 => old try and see way
> > > > 
> > > > > +/* We support at least as many nregions in
> > > > > VHOST_SET_MEM_TABLE:
> > > > > + * for use on legacy kernels without
> > > > > VHOST_GET_MEM_MAX_NREGIONS support. */ +#define
> > > > > VHOST_MEM_MAX_NREGIONS_DEFAULT 64
> > > > ^^^ not used below,
> > > > if it's for legacy then perhaps s/DEFAULT/LEGACY/ 
> > > 
> > > The assumption was that userspace detecting old kernels will just
> > > use 64, this means we do want a flag to get the old way.
> > > 
> > > OTOH if you won't think it's useful, let me know.
> > this header will be synced into QEMU's tree so that we could use
> > this define there, isn't it? IMHO then _LEGACY is more exact
>

Re: [PATCH RFC] vhost: add ioctl to query nregions upper limit

2015-06-24 Thread Igor Mammedov
On Wed, 24 Jun 2015 16:17:46 +0200
"Michael S. Tsirkin"  wrote:

> On Wed, Jun 24, 2015 at 04:07:27PM +0200, Igor Mammedov wrote:
> > On Wed, 24 Jun 2015 15:49:27 +0200
> > "Michael S. Tsirkin"  wrote:
> > 
> > > Userspace currently simply tries to give vhost as many regions
> > > as it happens to have, but you only have the mem table
> > > when you have initialized a large part of VM, so graceful
> > > failure is very hard to support.
> > > 
> > > The result is that userspace tends to fail catastrophically.
> > > 
> > > Instead, add a new ioctl so userspace can find out how much kernel
> > > supports, up front. This returns a positive value that we commit to.
> > > 
> > > Also, document our contract with legacy userspace: when running on an
> > > old kernel, you get -1 and you can assume at least 64 slots.  Since 0
> > > value's left unused, let's make that mean that the current userspace
> > > behaviour (trial and error) is required, just in case we want it back.
> > > 
> > > Signed-off-by: Michael S. Tsirkin 
> > > Cc: Igor Mammedov 
> > > Cc: Paolo Bonzini 
> > > ---
> > >  include/uapi/linux/vhost.h | 17 -
> > >  drivers/vhost/vhost.c  |  5 +
> > >  2 files changed, 21 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> > > index ab373191..f71fa6d 100644
> > > --- a/include/uapi/linux/vhost.h
> > > +++ b/include/uapi/linux/vhost.h
> > > @@ -80,7 +80,7 @@ struct vhost_memory {
> > >   * Allows subsequent call to VHOST_OWNER_SET to succeed. */
> > >  #define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02)
> > >  
> > > -/* Set up/modify memory layout */
> > > +/* Set up/modify memory layout: see also VHOST_GET_MEM_MAX_NREGIONS 
> > > below. */
> > >  #define VHOST_SET_MEM_TABLE  _IOW(VHOST_VIRTIO, 0x03, struct 
> > > vhost_memory)
> > >  
> > >  /* Write logging setup. */
> > > @@ -127,6 +127,21 @@ struct vhost_memory {
> > >  /* Set eventfd to signal an error */
> > >  #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct 
> > > vhost_vring_file)
> > >  
> > > +/* Query upper limit on nregions in VHOST_SET_MEM_TABLE arguments.
> > > + * Returns:
> > > + *   0 < value <= MAX_INT - gives the upper limit, higher values 
> > > will fail
> > > + *   0 - there's no static limit: try and see if it works
> > > + *   -1 - on failure
> > > + */
> > > +#define VHOST_GET_MEM_MAX_NREGIONS   _IO(VHOST_VIRTIO, 0x23)
> > > +
> > > +/* Returned by VHOST_GET_MEM_MAX_NREGIONS to mean there's no static 
> > > limit:
> > > + * try and it'll work if you are lucky. */
> > > +#define VHOST_MEM_MAX_NREGIONS_NONE 0
> > is it needed? we always have a limit,
> > or don't have IOCTL => -1 => old try and see way
> > 
> > > +/* We support at least as many nregions in VHOST_SET_MEM_TABLE:
> > > + * for use on legacy kernels without VHOST_GET_MEM_MAX_NREGIONS support. 
> > > */
> > > +#define VHOST_MEM_MAX_NREGIONS_DEFAULT 64
> > ^^^ not used below,
> > if it's for legacy then perhaps s/DEFAULT/LEGACY/ 
> 
> The assumption was that userspace detecting old kernels will just use 64,
> this means we do want a flag to get the old way.
> 
> OTOH if you won't think it's useful, let me know.
this header will be synced into QEMU's tree so that we could use this define 
there,
isn't it? IMHO then _LEGACY is more exact description of macro.

As for 0 return value, -1 is just fine for detecting old kernels (i.e. try and 
see if it works), so 0 looks unnecessary but it doesn't in any way hurt either.
For me limit or -1 is enough to try fix userspace.

> 
> > > +
> > >  /* VHOST_NET specific defines */
> > >  
> > >  /* Attach virtio net ring to a raw socket, or tap device.
> > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > index 9e8e004..3b68f9d 100644
> > > --- a/drivers/vhost/vhost.c
> > > +++ b/drivers/vhost/vhost.c
> > > @@ -917,6 +917,11 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned 
> > > int ioctl, void __user *argp)
> > >   long r;
> > >   int i, fd;
> > >  
> > > + if (ioctl == VHOST_GET_MEM_MAX_NREGIONS) {
> > > + r = VHOST_MEMORY_MAX_NREGIONS;
> > > + goto done;
> > > + }
> > > +
> > >   /* If you are not the owner, you can become one */
> > >   if (ioctl == VHOST_SET_OWNER) {
> > >   r = vhost_dev_set_owner(d);
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] vhost: add ioctl to query nregions upper limit

2015-06-24 Thread Igor Mammedov
On Wed, 24 Jun 2015 15:49:27 +0200
"Michael S. Tsirkin"  wrote:

> Userspace currently simply tries to give vhost as many regions
> as it happens to have, but you only have the mem table
> when you have initialized a large part of VM, so graceful
> failure is very hard to support.
> 
> The result is that userspace tends to fail catastrophically.
> 
> Instead, add a new ioctl so userspace can find out how much kernel
> supports, up front. This returns a positive value that we commit to.
> 
> Also, document our contract with legacy userspace: when running on an
> old kernel, you get -1 and you can assume at least 64 slots.  Since 0
> value's left unused, let's make that mean that the current userspace
> behaviour (trial and error) is required, just in case we want it back.
> 
> Signed-off-by: Michael S. Tsirkin 
> Cc: Igor Mammedov 
> Cc: Paolo Bonzini 
> ---
>  include/uapi/linux/vhost.h | 17 -
>  drivers/vhost/vhost.c  |  5 +
>  2 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index ab373191..f71fa6d 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -80,7 +80,7 @@ struct vhost_memory {
>   * Allows subsequent call to VHOST_OWNER_SET to succeed. */
>  #define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02)
>  
> -/* Set up/modify memory layout */
> +/* Set up/modify memory layout: see also VHOST_GET_MEM_MAX_NREGIONS below. */
>  #define VHOST_SET_MEM_TABLE  _IOW(VHOST_VIRTIO, 0x03, struct vhost_memory)
>  
>  /* Write logging setup. */
> @@ -127,6 +127,21 @@ struct vhost_memory {
>  /* Set eventfd to signal an error */
>  #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file)
>  
> +/* Query upper limit on nregions in VHOST_SET_MEM_TABLE arguments.
> + * Returns:
> + *   0 < value <= MAX_INT - gives the upper limit, higher values will fail
> + *   0 - there's no static limit: try and see if it works
> + *   -1 - on failure
> + */
> +#define VHOST_GET_MEM_MAX_NREGIONS   _IO(VHOST_VIRTIO, 0x23)
> +
> +/* Returned by VHOST_GET_MEM_MAX_NREGIONS to mean there's no static limit:
> + * try and it'll work if you are lucky. */
> +#define VHOST_MEM_MAX_NREGIONS_NONE 0
is it needed? we always have a limit,
or don't have IOCTL => -1 => old try and see way

> +/* We support at least as many nregions in VHOST_SET_MEM_TABLE:
> + * for use on legacy kernels without VHOST_GET_MEM_MAX_NREGIONS support. */
> +#define VHOST_MEM_MAX_NREGIONS_DEFAULT 64
^^^ not used below,
if it's for legacy then perhaps s/DEFAULT/LEGACY/ 

> +
>  /* VHOST_NET specific defines */
>  
>  /* Attach virtio net ring to a raw socket, or tap device.
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 9e8e004..3b68f9d 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -917,6 +917,11 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int 
> ioctl, void __user *argp)
>   long r;
>   int i, fd;
>  
> + if (ioctl == VHOST_GET_MEM_MAX_NREGIONS) {
> + r = VHOST_MEMORY_MAX_NREGIONS;
> + goto done;
> + }
> +
>   /* If you are not the owner, you can become one */
>   if (ioctl == VHOST_SET_OWNER) {
>   r = vhost_dev_set_owner(d);

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: support upto 509 memory regions

2015-05-19 Thread Igor Mammedov
On Mon, 18 May 2015 19:22:34 +0300
Andrey Korolyov  wrote:

> On Wed, Feb 18, 2015 at 7:27 AM, Michael S. Tsirkin  wrote:
> > On Tue, Feb 17, 2015 at 04:53:45PM -0800, Eric Northup wrote:
> >> On Tue, Feb 17, 2015 at 4:32 AM, Michael S. Tsirkin  
> >> wrote:
> >> > On Tue, Feb 17, 2015 at 11:59:48AM +0100, Paolo Bonzini wrote:
> >> >>
> >> >>
> >> >> On 17/02/2015 10:02, Michael S. Tsirkin wrote:
> >> >> > > Increasing VHOST_MEMORY_MAX_NREGIONS from 65 to 509
> >> >> > > to match KVM_USER_MEM_SLOTS fixes issue for vhost-net.
> >> >> > >
> >> >> > > Signed-off-by: Igor Mammedov 
> >> >> >
> >> >> > This scares me a bit: each region is 32byte, we are talking
> >> >> > a 16K allocation that userspace can trigger.
> >> >>
> >> >> What's bad with a 16K allocation?
> >> >
> >> > It fails when memory is fragmented.
> >> >
> >> >> > How does kvm handle this issue?
> >> >>
> >> >> It doesn't.
> >> >>
> >> >> Paolo
> >> >
> >> > I'm guessing kvm doesn't do memory scans on data path,
> >> > vhost does.
> >> >
> >> > qemu is just doing things that kernel didn't expect it to need.
> >> >
> >> > Instead, I suggest reducing number of GPA<->HVA mappings:
> >> >
> >> > you have GPA 1,5,7
> >> > map them at HVA 11,15,17
> >> > then you can have 1 slot: 1->11
> >> >
> >> > To avoid libc reusing the memory holes, reserve them with MAP_NORESERVE
> >> > or something like this.
> >>
> >> This works beautifully when host virtual address bits are more
> >> plentiful than guest physical address bits.  Not all architectures
> >> have that property, though.
> >
> > AFAIK this is pretty much a requirement for both kvm and vhost,
> > as we require each guest page to also be mapped in qemu memory.
> >
> >> > We can discuss smarter lookup algorithms but I'd rather
> >> > userspace didn't do things that we then have to
> >> > work around in kernel.
> >> >
> >> >
> >> > --
> >> > MST
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> >> > the body of a message to majord...@vger.kernel.org
> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> Hello,
> 
> any chance of getting the proposed patch in the mainline? Though it
> seems that most users will not suffer from relatively slot number
> ceiling (they can decrease slot 'granularity' for larger VMs and
> vice-versa), fine slot size, 256M or even 128M, with the large number
> of slots can be useful for a certain kind of tasks for an
> orchestration systems. I`ve made a backport series of all seemingly
> interesting memslot-related improvements to a 3.10 branch, is it worth
> to be tested with straighforward patch like one from above, with
> simulated fragmentation of allocations in host?

I'm almost done with approach suggested by Paolo,
i.e. replace linear search with faster/scalable lookup alg.
Hope to post it soon.

> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html