Re: [PATCH RFC] vhost: add ioctl to query nregions upper limit
On Wed, 24 Jun 2015 17:08:56 +0200 "Michael S. Tsirkin" wrote: > On Wed, Jun 24, 2015 at 04:52:29PM +0200, Igor Mammedov wrote: > > On Wed, 24 Jun 2015 16:17:46 +0200 > > "Michael S. Tsirkin" wrote: > > > > > On Wed, Jun 24, 2015 at 04:07:27PM +0200, Igor Mammedov wrote: > > > > On Wed, 24 Jun 2015 15:49:27 +0200 > > > > "Michael S. Tsirkin" wrote: > > > > > > > > > Userspace currently simply tries to give vhost as many regions > > > > > as it happens to have, but you only have the mem table > > > > > when you have initialized a large part of VM, so graceful > > > > > failure is very hard to support. > > > > > > > > > > The result is that userspace tends to fail catastrophically. > > > > > > > > > > Instead, add a new ioctl so userspace can find out how much > > > > > kernel supports, up front. This returns a positive value that > > > > > we commit to. > > > > > > > > > > Also, document our contract with legacy userspace: when > > > > > running on an old kernel, you get -1 and you can assume at > > > > > least 64 slots. Since 0 value's left unused, let's make that > > > > > mean that the current userspace behaviour (trial and error) > > > > > is required, just in case we want it back. > > > > > > > > > > Signed-off-by: Michael S. Tsirkin > > > > > Cc: Igor Mammedov > > > > > Cc: Paolo Bonzini > > > > > --- > > > > > include/uapi/linux/vhost.h | 17 - > > > > > drivers/vhost/vhost.c | 5 + > > > > > 2 files changed, 21 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/include/uapi/linux/vhost.h > > > > > b/include/uapi/linux/vhost.h index ab373191..f71fa6d 100644 > > > > > --- a/include/uapi/linux/vhost.h > > > > > +++ b/include/uapi/linux/vhost.h > > > > > @@ -80,7 +80,7 @@ struct vhost_memory { > > > > > * Allows subsequent call to VHOST_OWNER_SET to succeed. */ > > > > > #define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02) > > > > > > > > > > -/* Set up/modify memory layout */ > > > > > +/* Set up/modify memory layout: see also > > > > > VHOST_GET_MEM_MAX_NREGIONS below. */ #define > > > > > VHOST_SET_MEM_TABLE _IOW(VHOST_VIRTIO, 0x03, struct > > > > > vhost_memory) /* Write logging setup. */ > > > > > @@ -127,6 +127,21 @@ struct vhost_memory { > > > > > /* Set eventfd to signal an error */ > > > > > #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct > > > > > vhost_vring_file) > > > > > +/* Query upper limit on nregions in VHOST_SET_MEM_TABLE > > > > > arguments. > > > > > + * Returns: > > > > > + * 0 < value <= MAX_INT - gives the upper limit, > > > > > higher values will fail > > > > > + * 0 - there's no static limit: try and see if it > > > > > works > > > > > + * -1 - on failure > > > > > + */ > > > > > +#define VHOST_GET_MEM_MAX_NREGIONS _IO(VHOST_VIRTIO, 0x23) > > > > > + > > > > > +/* Returned by VHOST_GET_MEM_MAX_NREGIONS to mean there's no > > > > > static limit: > > > > > + * try and it'll work if you are lucky. */ > > > > > +#define VHOST_MEM_MAX_NREGIONS_NONE 0 > > > > is it needed? we always have a limit, > > > > or don't have IOCTL => -1 => old try and see way > > > > > > > > > +/* We support at least as many nregions in > > > > > VHOST_SET_MEM_TABLE: > > > > > + * for use on legacy kernels without > > > > > VHOST_GET_MEM_MAX_NREGIONS support. */ +#define > > > > > VHOST_MEM_MAX_NREGIONS_DEFAULT 64 > > > > ^^^ not used below, > > > > if it's for legacy then perhaps s/DEFAULT/LEGACY/ > > > > > > The assumption was that userspace detecting old kernels will just > > > use 64, this means we do want a flag to get the old way. > > > > > > OTOH if you won't think it's useful, let me know. > > this header will be synced into QEMU's tree so that we could use > > this define there, isn't it? IMHO then _LEGACY is more exact >
Re: [PATCH RFC] vhost: add ioctl to query nregions upper limit
On Wed, 24 Jun 2015 16:17:46 +0200 "Michael S. Tsirkin" wrote: > On Wed, Jun 24, 2015 at 04:07:27PM +0200, Igor Mammedov wrote: > > On Wed, 24 Jun 2015 15:49:27 +0200 > > "Michael S. Tsirkin" wrote: > > > > > Userspace currently simply tries to give vhost as many regions > > > as it happens to have, but you only have the mem table > > > when you have initialized a large part of VM, so graceful > > > failure is very hard to support. > > > > > > The result is that userspace tends to fail catastrophically. > > > > > > Instead, add a new ioctl so userspace can find out how much kernel > > > supports, up front. This returns a positive value that we commit to. > > > > > > Also, document our contract with legacy userspace: when running on an > > > old kernel, you get -1 and you can assume at least 64 slots. Since 0 > > > value's left unused, let's make that mean that the current userspace > > > behaviour (trial and error) is required, just in case we want it back. > > > > > > Signed-off-by: Michael S. Tsirkin > > > Cc: Igor Mammedov > > > Cc: Paolo Bonzini > > > --- > > > include/uapi/linux/vhost.h | 17 - > > > drivers/vhost/vhost.c | 5 + > > > 2 files changed, 21 insertions(+), 1 deletion(-) > > > > > > diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h > > > index ab373191..f71fa6d 100644 > > > --- a/include/uapi/linux/vhost.h > > > +++ b/include/uapi/linux/vhost.h > > > @@ -80,7 +80,7 @@ struct vhost_memory { > > > * Allows subsequent call to VHOST_OWNER_SET to succeed. */ > > > #define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02) > > > > > > -/* Set up/modify memory layout */ > > > +/* Set up/modify memory layout: see also VHOST_GET_MEM_MAX_NREGIONS > > > below. */ > > > #define VHOST_SET_MEM_TABLE _IOW(VHOST_VIRTIO, 0x03, struct > > > vhost_memory) > > > > > > /* Write logging setup. */ > > > @@ -127,6 +127,21 @@ struct vhost_memory { > > > /* Set eventfd to signal an error */ > > > #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct > > > vhost_vring_file) > > > > > > +/* Query upper limit on nregions in VHOST_SET_MEM_TABLE arguments. > > > + * Returns: > > > + * 0 < value <= MAX_INT - gives the upper limit, higher values > > > will fail > > > + * 0 - there's no static limit: try and see if it works > > > + * -1 - on failure > > > + */ > > > +#define VHOST_GET_MEM_MAX_NREGIONS _IO(VHOST_VIRTIO, 0x23) > > > + > > > +/* Returned by VHOST_GET_MEM_MAX_NREGIONS to mean there's no static > > > limit: > > > + * try and it'll work if you are lucky. */ > > > +#define VHOST_MEM_MAX_NREGIONS_NONE 0 > > is it needed? we always have a limit, > > or don't have IOCTL => -1 => old try and see way > > > > > +/* We support at least as many nregions in VHOST_SET_MEM_TABLE: > > > + * for use on legacy kernels without VHOST_GET_MEM_MAX_NREGIONS support. > > > */ > > > +#define VHOST_MEM_MAX_NREGIONS_DEFAULT 64 > > ^^^ not used below, > > if it's for legacy then perhaps s/DEFAULT/LEGACY/ > > The assumption was that userspace detecting old kernels will just use 64, > this means we do want a flag to get the old way. > > OTOH if you won't think it's useful, let me know. this header will be synced into QEMU's tree so that we could use this define there, isn't it? IMHO then _LEGACY is more exact description of macro. As for 0 return value, -1 is just fine for detecting old kernels (i.e. try and see if it works), so 0 looks unnecessary but it doesn't in any way hurt either. For me limit or -1 is enough to try fix userspace. > > > > + > > > /* VHOST_NET specific defines */ > > > > > > /* Attach virtio net ring to a raw socket, or tap device. > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > index 9e8e004..3b68f9d 100644 > > > --- a/drivers/vhost/vhost.c > > > +++ b/drivers/vhost/vhost.c > > > @@ -917,6 +917,11 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned > > > int ioctl, void __user *argp) > > > long r; > > > int i, fd; > > > > > > + if (ioctl == VHOST_GET_MEM_MAX_NREGIONS) { > > > + r = VHOST_MEMORY_MAX_NREGIONS; > > > + goto done; > > > + } > > > + > > > /* If you are not the owner, you can become one */ > > > if (ioctl == VHOST_SET_OWNER) { > > > r = vhost_dev_set_owner(d); > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] vhost: add ioctl to query nregions upper limit
On Wed, 24 Jun 2015 15:49:27 +0200 "Michael S. Tsirkin" wrote: > Userspace currently simply tries to give vhost as many regions > as it happens to have, but you only have the mem table > when you have initialized a large part of VM, so graceful > failure is very hard to support. > > The result is that userspace tends to fail catastrophically. > > Instead, add a new ioctl so userspace can find out how much kernel > supports, up front. This returns a positive value that we commit to. > > Also, document our contract with legacy userspace: when running on an > old kernel, you get -1 and you can assume at least 64 slots. Since 0 > value's left unused, let's make that mean that the current userspace > behaviour (trial and error) is required, just in case we want it back. > > Signed-off-by: Michael S. Tsirkin > Cc: Igor Mammedov > Cc: Paolo Bonzini > --- > include/uapi/linux/vhost.h | 17 - > drivers/vhost/vhost.c | 5 + > 2 files changed, 21 insertions(+), 1 deletion(-) > > diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h > index ab373191..f71fa6d 100644 > --- a/include/uapi/linux/vhost.h > +++ b/include/uapi/linux/vhost.h > @@ -80,7 +80,7 @@ struct vhost_memory { > * Allows subsequent call to VHOST_OWNER_SET to succeed. */ > #define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02) > > -/* Set up/modify memory layout */ > +/* Set up/modify memory layout: see also VHOST_GET_MEM_MAX_NREGIONS below. */ > #define VHOST_SET_MEM_TABLE _IOW(VHOST_VIRTIO, 0x03, struct vhost_memory) > > /* Write logging setup. */ > @@ -127,6 +127,21 @@ struct vhost_memory { > /* Set eventfd to signal an error */ > #define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file) > > +/* Query upper limit on nregions in VHOST_SET_MEM_TABLE arguments. > + * Returns: > + * 0 < value <= MAX_INT - gives the upper limit, higher values will fail > + * 0 - there's no static limit: try and see if it works > + * -1 - on failure > + */ > +#define VHOST_GET_MEM_MAX_NREGIONS _IO(VHOST_VIRTIO, 0x23) > + > +/* Returned by VHOST_GET_MEM_MAX_NREGIONS to mean there's no static limit: > + * try and it'll work if you are lucky. */ > +#define VHOST_MEM_MAX_NREGIONS_NONE 0 is it needed? we always have a limit, or don't have IOCTL => -1 => old try and see way > +/* We support at least as many nregions in VHOST_SET_MEM_TABLE: > + * for use on legacy kernels without VHOST_GET_MEM_MAX_NREGIONS support. */ > +#define VHOST_MEM_MAX_NREGIONS_DEFAULT 64 ^^^ not used below, if it's for legacy then perhaps s/DEFAULT/LEGACY/ > + > /* VHOST_NET specific defines */ > > /* Attach virtio net ring to a raw socket, or tap device. > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index 9e8e004..3b68f9d 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -917,6 +917,11 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int > ioctl, void __user *argp) > long r; > int i, fd; > > + if (ioctl == VHOST_GET_MEM_MAX_NREGIONS) { > + r = VHOST_MEMORY_MAX_NREGIONS; > + goto done; > + } > + > /* If you are not the owner, you can become one */ > if (ioctl == VHOST_SET_OWNER) { > r = vhost_dev_set_owner(d); -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: support upto 509 memory regions
On Mon, 18 May 2015 19:22:34 +0300 Andrey Korolyov wrote: > On Wed, Feb 18, 2015 at 7:27 AM, Michael S. Tsirkin wrote: > > On Tue, Feb 17, 2015 at 04:53:45PM -0800, Eric Northup wrote: > >> On Tue, Feb 17, 2015 at 4:32 AM, Michael S. Tsirkin > >> wrote: > >> > On Tue, Feb 17, 2015 at 11:59:48AM +0100, Paolo Bonzini wrote: > >> >> > >> >> > >> >> On 17/02/2015 10:02, Michael S. Tsirkin wrote: > >> >> > > Increasing VHOST_MEMORY_MAX_NREGIONS from 65 to 509 > >> >> > > to match KVM_USER_MEM_SLOTS fixes issue for vhost-net. > >> >> > > > >> >> > > Signed-off-by: Igor Mammedov > >> >> > > >> >> > This scares me a bit: each region is 32byte, we are talking > >> >> > a 16K allocation that userspace can trigger. > >> >> > >> >> What's bad with a 16K allocation? > >> > > >> > It fails when memory is fragmented. > >> > > >> >> > How does kvm handle this issue? > >> >> > >> >> It doesn't. > >> >> > >> >> Paolo > >> > > >> > I'm guessing kvm doesn't do memory scans on data path, > >> > vhost does. > >> > > >> > qemu is just doing things that kernel didn't expect it to need. > >> > > >> > Instead, I suggest reducing number of GPA<->HVA mappings: > >> > > >> > you have GPA 1,5,7 > >> > map them at HVA 11,15,17 > >> > then you can have 1 slot: 1->11 > >> > > >> > To avoid libc reusing the memory holes, reserve them with MAP_NORESERVE > >> > or something like this. > >> > >> This works beautifully when host virtual address bits are more > >> plentiful than guest physical address bits. Not all architectures > >> have that property, though. > > > > AFAIK this is pretty much a requirement for both kvm and vhost, > > as we require each guest page to also be mapped in qemu memory. > > > >> > We can discuss smarter lookup algorithms but I'd rather > >> > userspace didn't do things that we then have to > >> > work around in kernel. > >> > > >> > > >> > -- > >> > MST > >> > -- > >> > To unsubscribe from this list: send the line "unsubscribe kvm" in > >> > the body of a message to majord...@vger.kernel.org > >> > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe netdev" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > Hello, > > any chance of getting the proposed patch in the mainline? Though it > seems that most users will not suffer from relatively slot number > ceiling (they can decrease slot 'granularity' for larger VMs and > vice-versa), fine slot size, 256M or even 128M, with the large number > of slots can be useful for a certain kind of tasks for an > orchestration systems. I`ve made a backport series of all seemingly > interesting memslot-related improvements to a 3.10 branch, is it worth > to be tested with straighforward patch like one from above, with > simulated fragmentation of allocations in host? I'm almost done with approach suggested by Paolo, i.e. replace linear search with faster/scalable lookup alg. Hope to post it soon. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html