[ kvm-Bugs-2810749 ] linux guest framebuffer fails under vnc
Bugs item #2810749, was opened at 2009-06-23 09:23 Message generated for change (Comment added) made by elfe You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2810749&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Elfe (elfe) Assigned to: Nobody/Anonymous (nobody) Summary: linux guest framebuffer fails under vnc Initial Comment: When running a linux with framebuffer enabled the vnc server becomes unusable and causes the viewer to exit. (reconnecting to the vnc server does not work as well) The machines seems to be still up as it can respond to pings. booting the gentoo cd with the gentoo-nofb kernel works well I don't know if it's related but switching a windows guest to a higher resolution than 1024x768 caues the vnc client to exit as well. (it can be accessed again after waiting the 15 seconds for the display mode confirmation timeout) cpu: 1x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz kvm version: 0.85 (0.86 is not on gentoo atm) host kernel: 2.6.29-gentoo-r5 host os/architecture: gentoo x86_64 guest os: gentoo minimal install disc for x86_64, OpenSUSE 11.1 i586 commandline: kvm -drive file=/install-amd64-minimal-2008.0-r1.iso,if=ide,media=cdrom -m 768 -vnc :35 -smp 1 -usbdevice tablet -serial none -parallel none not helping: -no-kvm-irqchip -no-kvm-pit -no-kvm the vncviewer returns errors like: vncviewer PreferredEncoding=hextile x.x.x.x:35 VNC Viewer Free Edition 4.1.3 for X - built Apr 17 2009 14:25:08 Copyright (C) 2002-2008 RealVNC Ltd. See http://www.realvnc.com for information on VNC. Tue Jun 23 09:01:06 2009 CConn: connected to host x.x.x.x port 5935 CConnection: Server supports RFB protocol version 3.8 CConnection: Using RFB protocol version 3.8 TXImage: Using default colormap and visual, TrueColor, depth 24. CConn: Using pixel format depth 6 (8bpp) rgb222 CConn: Using hextile encoding Rect too big: 1x0 at 1,1024 exceeds 1024x768 Tue Jun 23 09:01:08 2009 main:Rect too big -- >Comment By: Elfe (elfe) Date: 2009-06-24 08:14 Message: seems to be working in qemu-kvm-0.10.5 -- Comment By: Elfe (elfe) Date: 2009-06-23 13:15 Message: I was able to work around the issue on OpenSUSE by changing the VESA 800x600 at 16bit to 800x600 with 24 bit color depth by using SDL for access. But the general issue with vnc is still there and should be addressed. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2810749&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virtio-serial: A guest <-> host interface for simple communication
On Tue, 23 Jun 2009 10:12:31 pm Amit Shah wrote: > Hello, > > Here are two patches. One implements a virtio-serial device in qemu > and the other is the driver for a guest kernel. > > While working on a vmchannel interface that is needed for communication > between guest userspace and host userspace, I saw that most of the > interface can be abstracted out as a "serial" device with "ports". OK, I don't think the "naming" idea works though. A userspace user would have to open each one in turn to get its name. I'd stick with numbers. You also don't have dynamic creation and removal, except by hotpluging the entire device (which was on your requirements page). I'd put a size and bitmap in the configuration space, and use that to indicate what ports exist. Register on the change interrupt to get updates. Drop the control vq entirely. Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] virtio_test: A module for testing virtio via userspace
On Fri, 19 Jun 2009 04:57:34 pm Christian Bornträger wrote: > Hello Rusty, > > this is a result of a two month internship about virtio testing. Interesting! > I would like to get feedback on > > o the general idea of a virtio_test module > o the user interface ioctls > o further ideas and comments Not mugging real drivers would be a requirement, I think. > +config VIRTIO_TEST > + tristate "Virtio test driver (EXPERIMENTAL)" > + select VIRTIO > + select VIRTIO_RING Perhaps these should be depends? Plus, depends on EXPERIMENTAL. > + If unsure, say M. That's "N" I think. > + case VIOTEST_IOCGETBUF: > + ret = do_get_buf(vtest, (struct viotest_getbuf __user *) arg); > + break; > + case VIOTEST_IOCGETCBS: > + ret = get_callbacks(vtest, (struct viotest_cbinfo __user *) > arg); > + break; Generally the point of callbacks is to tell you you have new buffers; in fact you're insulated from callbacks which don't show new buffers. So I'm not sure these two need to be separate? In which case, a read/write interface starts to make sense (write for addbuf and kick, read for get_buf). That fits nicely with O_NONBLOCK and poll(). Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions
On Tue, 23 Jun 2009 03:33:22 am Davide Libenzi wrote: > What you're doing there, is setting up a kernel-to-kernel (since > userspace only role is to create the eventfd) communication, using a file* > as accessory. That IMO is plain wrong. The most sensible is that userspace can use these fds; an in-kernel variant is possible too, but not primary IMHO. It's nice that userspace create the fds; it can then use the same fd for multiple event sources. But I didn't see anything wrong with the way eventfd used to work: you have a kvm ioctl to say "attach this eventfd to this guest notification" and that does the eventfd_fget. A detach ioctl does the fput (as does release of the kvm fd). If they close the eventfd and don't do the detach ioctl, it's their problem. Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: remove in_range from kvm_io_device
Michael S. Tsirkin wrote: > Remove in_range from kvm_io_device and ask read/write callbacks, if > supplied, to perform range checks internally. This allows aliasing > (mostly for in-kernel virtio), as well as better error handling by > making it possible to pass errors up to userspace. And it's enough to > look at the diffstat to see that it's a better API anyway. > > While we are at it, document locking rules for kvm_io_device. > Hi Michael, I just tried to apply this to kvm.git/master, and it blew up really badly. What tree should I be using? -Greg signature.asc Description: OpenPGP digital signature
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Tue, 23 Jun 2009 15:49:53 -0700 (PDT) Davide Libenzi wrote: > On Tue, 23 Jun 2009, Andrew Morton wrote: > > > On Tue, 23 Jun 2009 14:25:05 -0700 (PDT) > > Davide Libenzi wrote: > > > > > On Tue, 23 Jun 2009, Andrew Morton wrote: > > > > > > > That isn't for us to decide. Entire syscalls can be disabled in config. > > > > > > That is not a well defined separate syscall though. It's a member/feature > > > of the aiocb. > > > > I don't know what this means, really. > > This is the struct iocb: > > struct iocb { > ... > u_int32_t aio_resfd; > }; > > And the only interface to access KAIO is io_submit(). > IMO the end user perceives the KAIO functionality as the full deployment > of the iocb struct and the io_submit() accessory. > Can code not using eventfd work in (AIO && !EVENTFD) mode? Sure. > It is a kinda confusing configuration from the end user POV IMO. What's confusing about it? Most programmers who are using AIO probably don't even know that the eventd hookup is available. And even if they did, they might not want to use it, to remain compatible with older kernels. I'm still not seeing any compelling reason for unconditionally adding kernel text which some users don't need. Maybe there is such a reason, and it hasn't yet been beaten into my skull. But I still think it should be done in a separate patch. One which comes with a full description of the reasons for the change, btw. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Tue, 23 Jun 2009, Andrew Morton wrote: > On Tue, 23 Jun 2009 14:25:05 -0700 (PDT) > Davide Libenzi wrote: > > > On Tue, 23 Jun 2009, Andrew Morton wrote: > > > > > That isn't for us to decide. Entire syscalls can be disabled in config. > > > > That is not a well defined separate syscall though. It's a member/feature > > of the aiocb. > > I don't know what this means, really. This is the struct iocb: struct iocb { ... u_int32_t aio_resfd; }; And the only interface to access KAIO is io_submit(). IMO the end user perceives the KAIO functionality as the full deployment of the iocb struct and the io_submit() accessory. Can code not using eventfd work in (AIO && !EVENTFD) mode? Sure. It is a kinda confusing configuration from the end user POV IMO. - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] eventfd - revised interface and cleanups (3rd rev)
The following patch changes the eventfd interface to de-couple the eventfd memory context, from the file pointer instance. Without such change, there is no clean way to racely free handle the POLLHUP event sent when the last instance of the file* goes away. Also, now the internal eventfd APIs are using the eventfd context instead of the file*. Another cleanup this patch does, is making AIO select EVENTFD, instead of adding a bunch of empty function stubs inside eventfd.h in order to handle the (AIO && !EVENTFD) case. Signed-off-by: Davide Libenzi - Davide --- drivers/lguest/lg.h |2 drivers/lguest/lguest_user.c |4 - fs/aio.c | 24 ++-- fs/eventfd.c | 122 ++- include/linux/aio.h |4 - include/linux/eventfd.h | 18 ++ init/Kconfig |1 7 files changed, 129 insertions(+), 46 deletions(-) Index: linux-2.6.mod/fs/eventfd.c === --- linux-2.6.mod.orig/fs/eventfd.c 2009-06-21 16:54:15.0 -0700 +++ linux-2.6.mod/fs/eventfd.c 2009-06-23 14:45:54.0 -0700 @@ -14,35 +14,44 @@ #include #include #include -#include #include #include +#include +#include struct eventfd_ctx { + struct kref kref; wait_queue_head_t wqh; /* * Every time that a write(2) is performed on an eventfd, the * value of the __u64 being written is added to "count" and a * wakeup is performed on "wqh". A read(2) will return the "count" * value to userspace, and will reset "count" to zero. The kernel -* size eventfd_signal() also, adds to the "count" counter and +* side eventfd_signal() also, adds to the "count" counter and * issue a wakeup. */ __u64 count; unsigned int flags; }; -/* - * Adds "n" to the eventfd counter "count". Returns "n" in case of - * success, or a value lower then "n" in case of coutner overflow. - * This function is supposed to be called by the kernel in paths - * that do not allow sleeping. In this function we allow the counter - * to reach the ULLONG_MAX value, and we signal this as overflow - * condition by returining a POLLERR to poll(2). +/** + * eventfd_signal - Adds @n to the eventfd counter. + * @ctx: [in] Pointer to the eventfd context. + * @n: [in] Value of the counter to be added to the eventfd internal counter. + * The value cannot be negative. + * + * This function is supposed to be called by the kernel in paths that do not + * allow sleeping. In this function we allow the counter to reach the ULLONG_MAX + * value, and we signal this as overflow condition by returining a POLLERR + * to poll(2). + * + * Returns @n in case of success, a non-negative number lower than @n in case + * of overflow, or the following error codes: + * + * -EINVAL: The value of @n is negative. */ -int eventfd_signal(struct file *file, int n) +int eventfd_signal(struct eventfd_ctx *ctx, int n) { - struct eventfd_ctx *ctx = file->private_data; unsigned long flags; if (n < 0) @@ -59,9 +68,45 @@ int eventfd_signal(struct file *file, in } EXPORT_SYMBOL_GPL(eventfd_signal); +static void eventfd_free(struct kref *kref) +{ + struct eventfd_ctx *ctx = container_of(kref, struct eventfd_ctx, kref); + + kfree(ctx); +} + +/** + * eventfd_ctx_get - Acquires a reference to the internal eventfd context. + * @ctx: [in] Pointer to the eventfd context. + * + * Returns: In case of success, returns a pointer to the eventfd context. + */ +struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx) +{ + kref_get(&ctx->kref); + return ctx; +} +EXPORT_SYMBOL_GPL(eventfd_ctx_get); + +/** + * eventfd_ctx_put - Releases a reference to the internal eventfd context. + * @ctx: [in] Pointer to eventfd context. + * + * The eventfd context reference must have been previously acquired either + * with eventfd_ctx_get() or eventfd_ctx_fdget()). + */ +void eventfd_ctx_put(struct eventfd_ctx *ctx) +{ + kref_put(&ctx->kref, eventfd_free); +} +EXPORT_SYMBOL_GPL(eventfd_ctx_put); + static int eventfd_release(struct inode *inode, struct file *file) { - kfree(file->private_data); + struct eventfd_ctx *ctx = file->private_data; + + wake_up_poll(&ctx->wqh, POLLHUP); + eventfd_ctx_put(ctx); return 0; } @@ -185,6 +230,16 @@ static const struct file_operations even .write = eventfd_write, }; +/** + * eventfd_fget - Acquire a reference of an eventfd file descriptor. + * @fd: [in] Eventfd file descriptor. + * + * Returns a pointer to the eventfd file structure in case of success, or the + * following error pointer: + * + * -EBADF: Invalid @fd file descriptor. + * -EINVAL : The @fd file descriptor is not an eventfd file. + */ struct file *eventfd_fget(int fd) { struct file *file; @@ -201,
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Tue, 23 Jun 2009 14:48:51 -0700 (PDT) Davide Libenzi wrote: > > > This becomes pretty painful when the function calls other functions, for > > > which just relays the error code. > > > Should we be just documenting the error codes introduced by the function > > > code, and say that the function returns errors A, B, C plus all the ones > > > returned by the called functions X, Y, Z? > > > If not, it becomes hell in maintaining the comments... > > > > Well. Don't worry about making rules. Taste and common sense apply. "Will > > it be useful to readers if I explicitly document the return value". If > > "yes" then document away. If "no" then don't. > > Are you OK with the format in the patch below? Looks great to me. Obviously the cost of maintaining this level of detail is fairly high, and the chances of bitrot are also high. So I wouldn't be expecting people to document things at this level in general. But if you're prepared to maintain this then good for you. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Tue, 23 Jun 2009, Andrew Morton wrote: > On Tue, 23 Jun 2009 14:34:50 -0700 (PDT) > Davide Libenzi wrote: > > > On Tue, 23 Jun 2009, Andrew Morton wrote: > > > > > > Should functions be describing all the returned error codes, ala man > > > > pages? > > > > > > > > > > I think so. > > > > This becomes pretty painful when the function calls other functions, for > > which just relays the error code. > > Should we be just documenting the error codes introduced by the function > > code, and say that the function returns errors A, B, C plus all the ones > > returned by the called functions X, Y, Z? > > If not, it becomes hell in maintaining the comments... > > Well. Don't worry about making rules. Taste and common sense apply. "Will > it be useful to readers if I explicitly document the return value". If > "yes" then document away. If "no" then don't. Are you OK with the format in the patch below? - Davide --- drivers/lguest/lg.h |2 drivers/lguest/lguest_user.c |4 - fs/aio.c | 24 ++-- fs/eventfd.c | 122 ++- include/linux/aio.h |4 - include/linux/eventfd.h | 18 ++ init/Kconfig |1 7 files changed, 129 insertions(+), 46 deletions(-) Index: linux-2.6.mod/fs/eventfd.c === --- linux-2.6.mod.orig/fs/eventfd.c 2009-06-21 16:54:15.0 -0700 +++ linux-2.6.mod/fs/eventfd.c 2009-06-23 14:45:54.0 -0700 @@ -14,35 +14,44 @@ #include #include #include -#include #include #include +#include +#include struct eventfd_ctx { + struct kref kref; wait_queue_head_t wqh; /* * Every time that a write(2) is performed on an eventfd, the * value of the __u64 being written is added to "count" and a * wakeup is performed on "wqh". A read(2) will return the "count" * value to userspace, and will reset "count" to zero. The kernel -* size eventfd_signal() also, adds to the "count" counter and +* side eventfd_signal() also, adds to the "count" counter and * issue a wakeup. */ __u64 count; unsigned int flags; }; -/* - * Adds "n" to the eventfd counter "count". Returns "n" in case of - * success, or a value lower then "n" in case of coutner overflow. - * This function is supposed to be called by the kernel in paths - * that do not allow sleeping. In this function we allow the counter - * to reach the ULLONG_MAX value, and we signal this as overflow - * condition by returining a POLLERR to poll(2). +/** + * eventfd_signal - Adds @n to the eventfd counter. + * @ctx: [in] Pointer to the eventfd context. + * @n: [in] Value of the counter to be added to the eventfd internal counter. + * The value cannot be negative. + * + * This function is supposed to be called by the kernel in paths that do not + * allow sleeping. In this function we allow the counter to reach the ULLONG_MAX + * value, and we signal this as overflow condition by returining a POLLERR + * to poll(2). + * + * Returns @n in case of success, a non-negative number lower than @n in case + * of overflow, or the following error codes: + * + * -EINVAL: The value of @n is negative. */ -int eventfd_signal(struct file *file, int n) +int eventfd_signal(struct eventfd_ctx *ctx, int n) { - struct eventfd_ctx *ctx = file->private_data; unsigned long flags; if (n < 0) @@ -59,9 +68,45 @@ int eventfd_signal(struct file *file, in } EXPORT_SYMBOL_GPL(eventfd_signal); +static void eventfd_free(struct kref *kref) +{ + struct eventfd_ctx *ctx = container_of(kref, struct eventfd_ctx, kref); + + kfree(ctx); +} + +/** + * eventfd_ctx_get - Acquires a reference to the internal eventfd context. + * @ctx: [in] Pointer to the eventfd context. + * + * Returns: In case of success, returns a pointer to the eventfd context. + */ +struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx) +{ + kref_get(&ctx->kref); + return ctx; +} +EXPORT_SYMBOL_GPL(eventfd_ctx_get); + +/** + * eventfd_ctx_put - Releases a reference to the internal eventfd context. + * @ctx: [in] Pointer to eventfd context. + * + * The eventfd context reference must have been previously acquired either + * with eventfd_ctx_get() or eventfd_ctx_fdget()). + */ +void eventfd_ctx_put(struct eventfd_ctx *ctx) +{ + kref_put(&ctx->kref, eventfd_free); +} +EXPORT_SYMBOL_GPL(eventfd_ctx_put); + static int eventfd_release(struct inode *inode, struct file *file) { - kfree(file->private_data); + struct eventfd_ctx *ctx = file->private_data; + + wake_up_poll(&ctx->wqh, POLLHUP); + eventfd_ctx_put(ctx); return 0; } @@ -185,6 +230,16 @@ static const struct file_operations even .write = eventfd_write, }; +/** + * eventfd_fget - Acquire a reference of an e
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Tue, 23 Jun 2009 14:34:50 -0700 (PDT) Davide Libenzi wrote: > On Tue, 23 Jun 2009, Andrew Morton wrote: > > > > Should functions be describing all the returned error codes, ala man > > > pages? > > > > > > > I think so. > > This becomes pretty painful when the function calls other functions, for > which just relays the error code. > Should we be just documenting the error codes introduced by the function > code, and say that the function returns errors A, B, C plus all the ones > returned by the called functions X, Y, Z? > If not, it becomes hell in maintaining the comments... Well. Don't worry about making rules. Taste and common sense apply. "Will it be useful to readers if I explicitly document the return value". If "yes" then document away. If "no" then don't. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Tue, 23 Jun 2009 14:25:05 -0700 (PDT) Davide Libenzi wrote: > On Tue, 23 Jun 2009, Andrew Morton wrote: > > > That isn't for us to decide. Entire syscalls can be disabled in config. > > That is not a well defined separate syscall though. It's a member/feature > of the aiocb. I don't know what this means, really. AIO eventfd support is a relatively recently added enhancement to AIO, is it not? Applications which continue to use the pre-May07 AIO features will continue to work as before (they darn well better). So for such applications, AIO=y/EVENTFD=n kernels are usable and useful, and eliminating this option is a loss? Either way, I believe that this change should be unbundled from the unrelated KVM one so we can handle it separately. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Tue, 23 Jun 2009, Andrew Morton wrote: > > Should functions be describing all the returned error codes, ala man pages? > > > > I think so. This becomes pretty painful when the function calls other functions, for which just relays the error code. Should we be just documenting the error codes introduced by the function code, and say that the function returns errors A, B, C plus all the ones returned by the called functions X, Y, Z? If not, it becomes hell in maintaining the comments... - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Tue, 23 Jun 2009, Andrew Morton wrote: > > > > +struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx) > > > > +{ > > > > + kref_get(&ctx->kref); > > > > + return ctx; > > > > +} > > > > +EXPORT_SYMBOL_GPL(eventfd_ctx_get); > > > > > > doesn't match the code. > > You appear to have not seen the above sentence. I saw that and have already fixed it. Sorry I missed the ACK. > yeah, I never trust people. You might lose the email or jump on a > plane and disappear for three weeks, then it all gets forgotten about > and lost. Jumping on a plane. Check. Disappearing for 6 weeks. Check. ;) - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Tue, 23 Jun 2009, Andrew Morton wrote: > That isn't for us to decide. Entire syscalls can be disabled in config. That is not a well defined separate syscall though. It's a member/feature of the aiocb. - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Tue, 23 Jun 2009 14:03:22 -0700 (PDT) Davide Libenzi wrote: > On Tue, 23 Jun 2009, Andrew Morton wrote: > > > On Tue, 23 Jun 2009 12:25:36 -0700 (PDT) > > Davide Libenzi wrote: > > > > > The following patch changes the eventfd interface to de-couple the > > > eventfd > > > memory context, from the file pointer instance. > > > Without such change, there is no clean way to racely free handle the > > > POLLHUP event sent when the last instance of the file* goes away. > > > Also, now the internal eventfd APIs are using the eventfd context instead > > > of the file*. > > > Another cleanup this patch does, is making AIO select EVENTFD, instead of > > > adding a bunch of empty function stubs inside eventfd.h in order to > > > handle the (AIO && !EVENTFD) case. > > > > > > ... > > > > > > +/** > > > + * eventfd_ctx_get - Acquires a reference to the internal eventfd > > > context. > > > + * @ctx: [in] Pointer to the eventfd context. > > > + * > > > + * Returns: In case of success, returns a pointer to the eventfd context, > > > + * otherwise a proper error code. > > > > The description of the return value > > Should functions be describing all the returned error codes, ala man pages? > I think so. > > > > + */ > > > +struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx) > > > +{ > > > + kref_get(&ctx->kref); > > > + return ctx; > > > +} > > > +EXPORT_SYMBOL_GPL(eventfd_ctx_get); > > > > doesn't match the code. You appear to have not seen the above sentence. > > Also... > > > > > + * Returns: A pointer to the eventfd file structure in case of success, > > > or a > > > + * proper error pointer in case of failure. > > > > > > > + * Returns: In case of success, it returns a pointer to the internal > > > eventfd > > > + * context, otherwise a proper error code. > > > + */ > > > > I'm unsure what the word "proper" means in this context. > > > > The term "proper error pointer" is understandable enough - something > > you run IS_ERR() against. "error pointer" would suffice. > > > > But the term "proper error code" is getting a bit remote from reality. > > > > Unfortunately the kernel doesn't have a simple and agreed-to term for > > an ERR_PTR() thingy. Perhaps we should invent one. "err_ptr"? > > OK, but you tricked me once again :) > You posted your comments/changes while you merged the old version in -mm > already. yeah, I never trust people. You might lose the email or jump on a plane and disappear for three weeks, then it all gets forgotten about and lost. If the code doesn't have any apparent showstoppers I'll often merge it with a note that it isn't finalised. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Tue, 23 Jun 2009 13:59:07 -0700 (PDT) Davide Libenzi wrote: > On Tue, 23 Jun 2009, Andrew Morton wrote: > > > On Tue, 23 Jun 2009 12:25:36 -0700 (PDT) > > Davide Libenzi wrote: > > > > > Another cleanup this patch does, is making AIO select EVENTFD, instead of > > > adding a bunch of empty function stubs inside eventfd.h in order to > > > handle the (AIO && !EVENTFD) case. > > > > Given that we're trying to squeak this patch into 2.6.31, it's quite > > unfortunate to include unrelated changes. Especially ones which > > involve the always-problematic "select". Although this one looks less > > than usually deadly due to the simple dependencies of AIO and eventfd. > > > > However... > > > > Is this a good way of fixing the (AIO && !EVENTFD) case? Surely if the > > developer selected this combination, he doesn't want to carry the > > overhead of including eventfd. IOW, if AIO can work acceptably without > > eventfd being compiled into the kernel then adding the stubs is a > > superior solution. > > IMO when someone says "AIO included in the kernel", should get the whole > AIO functionality and not part of it. > People already started using KAIO+eventfd, and a (AIO && !EVENTFD) config > will make their code fail. That isn't for us to decide. Entire syscalls can be disabled in config. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Tue, 23 Jun 2009, Andrew Morton wrote: > On Tue, 23 Jun 2009 12:25:36 -0700 (PDT) > Davide Libenzi wrote: > > > The following patch changes the eventfd interface to de-couple the eventfd > > memory context, from the file pointer instance. > > Without such change, there is no clean way to racely free handle the > > POLLHUP event sent when the last instance of the file* goes away. > > Also, now the internal eventfd APIs are using the eventfd context instead > > of the file*. > > Another cleanup this patch does, is making AIO select EVENTFD, instead of > > adding a bunch of empty function stubs inside eventfd.h in order to > > handle the (AIO && !EVENTFD) case. > > > > ... > > > > +/** > > + * eventfd_ctx_get - Acquires a reference to the internal eventfd context. > > + * @ctx: [in] Pointer to the eventfd context. > > + * > > + * Returns: In case of success, returns a pointer to the eventfd context, > > + * otherwise a proper error code. > > The description of the return value Should functions be describing all the returned error codes, ala man pages? > > + */ > > +struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx) > > +{ > > + kref_get(&ctx->kref); > > + return ctx; > > +} > > +EXPORT_SYMBOL_GPL(eventfd_ctx_get); > > doesn't match the code. > > Also... > > > + * Returns: A pointer to the eventfd file structure in case of success, or > > a > > + * proper error pointer in case of failure. > > > > + * Returns: In case of success, it returns a pointer to the internal > > eventfd > > + * context, otherwise a proper error code. > > + */ > > I'm unsure what the word "proper" means in this context. > > The term "proper error pointer" is understandable enough - something > you run IS_ERR() against. "error pointer" would suffice. > > But the term "proper error code" is getting a bit remote from reality. > > Unfortunately the kernel doesn't have a simple and agreed-to term for > an ERR_PTR() thingy. Perhaps we should invent one. "err_ptr"? OK, but you tricked me once again :) You posted your comments/changes while you merged the old version in -mm already. - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Tue, 23 Jun 2009, Andrew Morton wrote: > On Tue, 23 Jun 2009 12:25:36 -0700 (PDT) > Davide Libenzi wrote: > > > Another cleanup this patch does, is making AIO select EVENTFD, instead of > > adding a bunch of empty function stubs inside eventfd.h in order to > > handle the (AIO && !EVENTFD) case. > > Given that we're trying to squeak this patch into 2.6.31, it's quite > unfortunate to include unrelated changes. Especially ones which > involve the always-problematic "select". Although this one looks less > than usually deadly due to the simple dependencies of AIO and eventfd. > > However... > > Is this a good way of fixing the (AIO && !EVENTFD) case? Surely if the > developer selected this combination, he doesn't want to carry the > overhead of including eventfd. IOW, if AIO can work acceptably without > eventfd being compiled into the kernel then adding the stubs is a > superior solution. IMO when someone says "AIO included in the kernel", should get the whole AIO functionality and not part of it. People already started using KAIO+eventfd, and a (AIO && !EVENTFD) config will make their code fail. - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Tue, 23 Jun 2009 12:25:36 -0700 (PDT) Davide Libenzi wrote: > The following patch changes the eventfd interface to de-couple the eventfd > memory context, from the file pointer instance. > Without such change, there is no clean way to racely free handle the > POLLHUP event sent when the last instance of the file* goes away. > Also, now the internal eventfd APIs are using the eventfd context instead > of the file*. > Another cleanup this patch does, is making AIO select EVENTFD, instead of > adding a bunch of empty function stubs inside eventfd.h in order to > handle the (AIO && !EVENTFD) case. > > > ... > > +/** > + * eventfd_ctx_get - Acquires a reference to the internal eventfd context. > + * @ctx: [in] Pointer to the eventfd context. > + * > + * Returns: In case of success, returns a pointer to the eventfd context, > + * otherwise a proper error code. The description of the return value > + */ > +struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx) > +{ > + kref_get(&ctx->kref); > + return ctx; > +} > +EXPORT_SYMBOL_GPL(eventfd_ctx_get); doesn't match the code. Also... > + * Returns: A pointer to the eventfd file structure in case of success, or a > + * proper error pointer in case of failure. > + * Returns: In case of success, it returns a pointer to the internal eventfd > + * context, otherwise a proper error code. > + */ I'm unsure what the word "proper" means in this context. The term "proper error pointer" is understandable enough - something you run IS_ERR() against. "error pointer" would suffice. But the term "proper error code" is getting a bit remote from reality. Unfortunately the kernel doesn't have a simple and agreed-to term for an ERR_PTR() thingy. Perhaps we should invent one. "err_ptr"? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Tue, 23 Jun 2009 12:25:36 -0700 (PDT) Davide Libenzi wrote: > Another cleanup this patch does, is making AIO select EVENTFD, instead of > adding a bunch of empty function stubs inside eventfd.h in order to > handle the (AIO && !EVENTFD) case. Given that we're trying to squeak this patch into 2.6.31, it's quite unfortunate to include unrelated changes. Especially ones which involve the always-problematic "select". Although this one looks less than usually deadly due to the simple dependencies of AIO and eventfd. However... Is this a good way of fixing the (AIO && !EVENTFD) case? Surely if the developer selected this combination, he doesn't want to carry the overhead of including eventfd. IOW, if AIO can work acceptably without eventfd being compiled into the kernel then adding the stubs is a superior solution. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Tue, 23 Jun 2009, Andrew Morton wrote: > On Tue, 23 Jun 2009 12:25:36 -0700 (PDT) > Davide Libenzi wrote: > > > The following patch changes the eventfd interface to de-couple the eventfd > > memory context, from the file pointer instance. > > Without such change, there is no clean way to racely free handle the > > POLLHUP event sent when the last instance of the file* goes away. > > Also, now the internal eventfd APIs are using the eventfd context instead > > of the file*. > > Another cleanup this patch does, is making AIO select EVENTFD, instead of > > adding a bunch of empty function stubs inside eventfd.h in order to > > handle the (AIO && !EVENTFD) case. > > > > If we really want to squeeze this into 2.6.31 then it would be helpful > to have justification in the changelog, please. I see that some KVM > feature needs it, but what and why and why now, etc? The patch in -next added the ability to have waiters to be notified when the last instance of the file* is dropped (that is, on ->release). But it is not possible for waiters to handle the POLLHUP event in a racy-free way, unless the eventfd memory context is de-coupled from the file*. The next patches from gregory will use eventfd (and the POLLHUP handling) to implement the IRQfd code inside KVM. > What's going on here? The POLLHUP patch in -next is not sufficent. I based the patch over mainline, but if you guys want I can rebase it over -next. - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups (2nd rev)
On Tue, 23 Jun 2009 12:25:36 -0700 (PDT) Davide Libenzi wrote: > The following patch changes the eventfd interface to de-couple the eventfd > memory context, from the file pointer instance. > Without such change, there is no clean way to racely free handle the > POLLHUP event sent when the last instance of the file* goes away. > Also, now the internal eventfd APIs are using the eventfd context instead > of the file*. > Another cleanup this patch does, is making AIO select EVENTFD, instead of > adding a bunch of empty function stubs inside eventfd.h in order to > handle the (AIO && !EVENTFD) case. > If we really want to squeeze this into 2.6.31 then it would be helpful to have justification in the changelog, please. I see that some KVM feature needs it, but what and why and why now, etc? The patch conflicts with similar changes int he KVM tree, but that'll just ruin sfr's life for a day. Your patch has: > ... > static int eventfd_release(struct inode *inode, struct file *file) > { > - kfree(file->private_data); > + struct eventfd_ctx *ctx = file->private_data; > + > + wake_up_poll(&ctx->wqh, POLLHUP); > + eventfd_ctx_put(ctx); > return 0; > } but the patch in linux-next has static int eventfd_release(struct inode *inode, struct file *file) { struct eventfd_ctx *ctx = file->private_data; /* * No need to hold the lock here, since we are on the file cleanup * path and the ones still attached to the wait queue will be * serialized by wake_up_locked_poll(). */ wake_up_locked_poll(&ctx->wqh, POLLHUP); kfree(ctx); return 0; } which looks quite different (and has a nice comment!) What's going on here? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] eventfd - revised interface and cleanups (2nd rev)
The following patch changes the eventfd interface to de-couple the eventfd memory context, from the file pointer instance. Without such change, there is no clean way to racely free handle the POLLHUP event sent when the last instance of the file* goes away. Also, now the internal eventfd APIs are using the eventfd context instead of the file*. Another cleanup this patch does, is making AIO select EVENTFD, instead of adding a bunch of empty function stubs inside eventfd.h in order to handle the (AIO && !EVENTFD) case. Signed-off-by: Davide Libenzi - Davide --- drivers/lguest/lg.h |2 drivers/lguest/lguest_user.c |4 - fs/aio.c | 24 ++-- fs/eventfd.c | 115 ++- include/linux/aio.h |4 - include/linux/eventfd.h | 18 ++ init/Kconfig |1 7 files changed, 122 insertions(+), 46 deletions(-) Index: linux-2.6.mod/fs/eventfd.c === --- linux-2.6.mod.orig/fs/eventfd.c 2009-06-21 16:54:15.0 -0700 +++ linux-2.6.mod/fs/eventfd.c 2009-06-23 10:57:58.0 -0700 @@ -14,35 +14,41 @@ #include #include #include -#include #include #include +#include +#include struct eventfd_ctx { + struct kref kref; wait_queue_head_t wqh; /* * Every time that a write(2) is performed on an eventfd, the * value of the __u64 being written is added to "count" and a * wakeup is performed on "wqh". A read(2) will return the "count" * value to userspace, and will reset "count" to zero. The kernel -* size eventfd_signal() also, adds to the "count" counter and +* side eventfd_signal() also, adds to the "count" counter and * issue a wakeup. */ __u64 count; unsigned int flags; }; -/* - * Adds "n" to the eventfd counter "count". Returns "n" in case of - * success, or a value lower then "n" in case of coutner overflow. - * This function is supposed to be called by the kernel in paths - * that do not allow sleeping. In this function we allow the counter - * to reach the ULLONG_MAX value, and we signal this as overflow - * condition by returining a POLLERR to poll(2). +/** + * eventfd_signal - Adds @n to the eventfd counter. + * @ctx: [in] Pointer to the eventfd context. + * @n: [in] Value of the counter to be added to the eventfd internal counter. + * + * This function is supposed to be called by the kernel in paths that do not + * allow sleeping. In this function we allow the counter to reach the ULLONG_MAX + * value, and we signal this as overflow condition by returining a POLLERR + * to poll(2). + * + * Returns: In case of success, it returns @n, otherwise (in case of overflow + * of the eventfd 64bit internal counter) a value lower than @n. */ -int eventfd_signal(struct file *file, int n) +int eventfd_signal(struct eventfd_ctx *ctx, int n) { - struct eventfd_ctx *ctx = file->private_data; unsigned long flags; if (n < 0) @@ -59,9 +65,48 @@ int eventfd_signal(struct file *file, in } EXPORT_SYMBOL_GPL(eventfd_signal); +static void eventfd_free(struct kref *kref) +{ + struct eventfd_ctx *ctx = container_of(kref, struct eventfd_ctx, kref); + + kfree(ctx); +} + +/** + * eventfd_ctx_get - Acquires a reference to the internal eventfd context. + * @ctx: [in] Pointer to the eventfd context. + * + * Returns: In case of success, returns a pointer to the eventfd context, + * otherwise a proper error code. + */ +struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx) +{ + kref_get(&ctx->kref); + return ctx; +} +EXPORT_SYMBOL_GPL(eventfd_ctx_get); + +/** + * eventfd_ctx_put - Releases a reference to the internal eventfd context. + * @ctx: [in] Pointer to eventfd context. + * + * The eventfd context reference must have been previously acquired either + * with eventfd_ctx_get() or eventfd_ctx_fdget()). + * + * Returns: Nothing. + */ +void eventfd_ctx_put(struct eventfd_ctx *ctx) +{ + kref_put(&ctx->kref, eventfd_free); +} +EXPORT_SYMBOL_GPL(eventfd_ctx_put); + static int eventfd_release(struct inode *inode, struct file *file) { - kfree(file->private_data); + struct eventfd_ctx *ctx = file->private_data; + + wake_up_poll(&ctx->wqh, POLLHUP); + eventfd_ctx_put(ctx); return 0; } @@ -185,6 +230,13 @@ static const struct file_operations even .write = eventfd_write, }; +/** + * eventfd_fget - Acquire a reference of an eventfd file descriptor. + * @fd: [in] Eventfd file descriptor. + * + * Returns: A pointer to the eventfd file structure in case of success, or a + * proper error pointer in case of failure. + */ struct file *eventfd_fget(int fd) { struct file *file; @@ -201,6 +253,44 @@ struct file *eventfd_fget(int fd) } EXPORT_SYMBOL_GPL(eventfd_fget);
KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU ioctl paths
Correct missing locking in a few places in x86's vm_ioctl handling path. Signed-off-by: Marcelo Tosatti Index: kvm/arch/x86/kvm/x86.c === --- kvm.orig/arch/x86/kvm/x86.c +++ kvm/arch/x86/kvm/x86.c @@ -1951,19 +1951,25 @@ static int kvm_vm_ioctl_set_irqchip(stru r = 0; switch (chip->chip_id) { case KVM_IRQCHIP_PIC_MASTER: + spin_lock(&pic_irqchip(kvm)->lock); memcpy(&pic_irqchip(kvm)->pics[0], &chip->chip.pic, sizeof(struct kvm_pic_state)); + spin_unlock(&pic_irqchip(kvm)->lock); break; case KVM_IRQCHIP_PIC_SLAVE: + spin_lock(&pic_irqchip(kvm)->lock); memcpy(&pic_irqchip(kvm)->pics[1], &chip->chip.pic, sizeof(struct kvm_pic_state)); + spin_unlock(&pic_irqchip(kvm)->lock); break; case KVM_IRQCHIP_IOAPIC: + mutex_lock(&kvm->irq_lock); memcpy(ioapic_irqchip(kvm), &chip->chip.ioapic, sizeof(struct kvm_ioapic_state)); + mutex_unlock(&kvm->irq_lock); break; default: r = -EINVAL; @@ -1977,7 +1983,9 @@ static int kvm_vm_ioctl_get_pit(struct k { int r = 0; + mutex_lock(&kvm->arch.vpit->pit_state.lock); memcpy(ps, &kvm->arch.vpit->pit_state, sizeof(struct kvm_pit_state)); + mutex_unlock(&kvm->arch.vpit->pit_state.lock); return r; } @@ -1985,8 +1993,10 @@ static int kvm_vm_ioctl_set_pit(struct k { int r = 0; + mutex_lock(&kvm->arch.vpit->pit_state.lock); memcpy(&kvm->arch.vpit->pit_state, ps, sizeof(struct kvm_pit_state)); kvm_pit_load_count(kvm, 0, ps->channels[0].count); + mutex_unlock(&kvm->arch.vpit->pit_state.lock); return r; } @@ -1995,7 +2005,9 @@ static int kvm_vm_ioctl_reinject(struct { if (!kvm->arch.vpit) return -ENXIO; + mutex_lock(&kvm->arch.vpit->pit_state.lock); kvm->arch.vpit->pit_state.pit_timer.reinject = control->pit_reinject; + mutex_unlock(&kvm->arch.vpit->pit_state.lock); return 0; } Index: kvm/arch/x86/kvm/i8254.c === --- kvm.orig/arch/x86/kvm/i8254.c +++ kvm/arch/x86/kvm/i8254.c @@ -342,9 +342,7 @@ static void pit_load_count(struct kvm *k void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val) { - mutex_lock(&kvm->arch.vpit->pit_state.lock); pit_load_count(kvm, channel, val); - mutex_unlock(&kvm->arch.vpit->pit_state.lock); } static inline struct kvm_pit *dev_to_pit(struct kvm_io_device *dev) Index: kvm/virt/kvm/kvm_main.c === --- kvm.orig/virt/kvm/kvm_main.c +++ kvm/virt/kvm/kvm_main.c @@ -2248,10 +2248,12 @@ static long kvm_vm_ioctl(struct file *fi #ifdef CONFIG_KVM_APIC_ARCHITECTURE case KVM_SET_BOOT_CPU_ID: r = 0; + mutex_lock(&kvm->lock); if (atomic_read(&kvm->online_vcpus) != 0) r = -EBUSY; else kvm->bsp_vcpu_id = arg; + mutex_unlock(&kvm->lock); break; #endif default: -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups
On Tue, 23 Jun 2009, Gregory Haskins wrote: > I think the lack of a way to got from a file* to a ctx* (or back) might > be a problem. I am not an expert, but if I understood the gist of what > Al Viro (cc'd) was saying early on, an fd can change meaning out from > under you without warning. > > With the code the way it is now, I would need to call both > eventfd_fget() and eventfd_ctx_fdget() to fully acquire state (unless I > am missing something). I would think that is a race and I am not > guaranteed to get the same object. Can we also have a way to convert > one reference to the other? I am thinking > > struct eventfd_ctx *eventfd_ctx_fget(struct file *) > > (or similar) would be sufficient. You're right. That function was available in the previous patch, but then I dropped it because I thought the eventfd code could be simplified. But yes, it's needed. - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups
Davide Libenzi wrote: > The following patch changes the eventfd interface to de-couple the eventfd > memory context, from the file pointer instance. > Without such change, there is no clean way to racely free handle the > POLLHUP event sent when the last instance of the file* goes away. > Also, now the internal eventfd APIs are using the eventfd context instead > of the file*. > Another cleanup this patch does, is making AIO select EVENTFD, instead of > adding a bunch of empty function stubs inside eventfd.h. > > Andrew, this better go via Avi and the KVM tree, since they have patches > that will be based on the new interface. > > > > Signed-off-by: Davide Libenzi > > > - Davide > > > --- > drivers/lguest/lg.h |2 > drivers/lguest/lguest_user.c |4 - > fs/aio.c | 24 ++ > fs/eventfd.c | 101 > ++- > include/linux/aio.h |4 - > include/linux/eventfd.h | 17 ++- > init/Kconfig |1 > 7 files changed, 108 insertions(+), 45 deletions(-) > > Index: linux-2.6.mod/fs/eventfd.c > === > --- linux-2.6.mod.orig/fs/eventfd.c 2009-06-21 16:54:15.0 -0700 > +++ linux-2.6.mod/fs/eventfd.c2009-06-23 09:34:42.0 -0700 > @@ -17,32 +17,38 @@ > #include > #include > #include > +#include > > struct eventfd_ctx { > + struct kref kref; > wait_queue_head_t wqh; > /* >* Every time that a write(2) is performed on an eventfd, the >* value of the __u64 being written is added to "count" and a >* wakeup is performed on "wqh". A read(2) will return the "count" >* value to userspace, and will reset "count" to zero. The kernel > - * size eventfd_signal() also, adds to the "count" counter and > + * side eventfd_signal() also, adds to the "count" counter and >* issue a wakeup. >*/ > __u64 count; > unsigned int flags; > }; > > -/* > - * Adds "n" to the eventfd counter "count". Returns "n" in case of > - * success, or a value lower then "n" in case of coutner overflow. > - * This function is supposed to be called by the kernel in paths > - * that do not allow sleeping. In this function we allow the counter > - * to reach the ULLONG_MAX value, and we signal this as overflow > - * condition by returining a POLLERR to poll(2). > +/** > + * eventfd_signal - Adds @n to the eventfd counter. This function is > + * supposed to be called by the kernel in paths that do not > + * allow sleeping. In this function we allow the counter > + * to reach the ULLONG_MAX value, and we signal this as > + * overflow condition by returining a POLLERR to poll(2). > + * > + * @ctx: [in] Pointer to the eventfd context. > + * @n: [in] Value of the counter to be added to the eventfd internal counter. > + * > + * Returns: In case of success, it returns @n, otherwise (in case of overflow > + * of the eventfd 64bit internal counter) a value lower than @n. > */ > -int eventfd_signal(struct file *file, int n) > +int eventfd_signal(struct eventfd_ctx *ctx, int n) > { > - struct eventfd_ctx *ctx = file->private_data; > unsigned long flags; > > if (n < 0) > @@ -59,9 +65,49 @@ int eventfd_signal(struct file *file, in > } > EXPORT_SYMBOL_GPL(eventfd_signal); > > +static void eventfd_free(struct kref *kref) > +{ > + struct eventfd_ctx *ctx = container_of(kref, struct eventfd_ctx, kref); > + > + kfree(ctx); > +} > + > +/** > + * eventfd_ctx_get - Acquires a reference to the internal eventfd context. > + * > + * @ctx: [in] Pointer to the eventfd context. > + * > + * Returns: In case of success, returns a pointer to the eventfd context, > + * otherwise a proper error code. > + */ > +struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx) > +{ > + kref_get(&ctx->kref); > + return ctx; > +} > +EXPORT_SYMBOL_GPL(eventfd_ctx_get); > + > +/** > + * eventfd_ctx_put - Releases a reference to the internal eventfd context > + * (previously acquired either with eventfd_ctx_get() or > + * eventfd_ctx_fdget()). > + * > + * @ctx: [in] Pointer to eventfd context. > + * > + * Returns: Nothing. > + */ > +void eventfd_ctx_put(struct eventfd_ctx *ctx) > +{ > + kref_put(&ctx->kref, eventfd_free); > +} > +EXPORT_SYMBOL_GPL(eventfd_ctx_put); > + > static int eventfd_release(struct inode *inode, struct file *file) > { > - kfree(file->private_data); > + struct eventfd_ctx *ctx = file->private_data; > + > + wake_up_poll(&ctx->wqh, POLLHUP); > + eventfd_ctx_put(ctx); > return 0; > } > > @@ -185,6 +231,14 @@ static const struct file_operations even > .write = eventfd_write, > }; > > +/** > + * eventfd_fget - Acquire a reference of an eventfd file descriptor.
virtio_blk fails for rhel5.3 guest with LVM: kernel panic
I see that Rhel5.3 32 and 64 bit guests fail to boot with virtio for block device. The guest can not find the root filesystem. The guest is using LVM. As per the linux-kvm wiki instructions I did change device.map and booted with if=virtio. The wiki link is http://www.linux-kvm.org/page/Boot_from_virtio_block_device qemu command: # qemu-system-x86_64 -drive file=rhel5-32.raw,if=virtio,boot=on -m 2048 -smp 4 -net nic,model=rtl8139,macaddr=00:FF:FE:00:00:64 -net tap,script=/root/qemu-ifup-latest -name 32virtio -vnc :2 -watchdog i6300esb -watchdog-action reset & The image was installed using if=ide. The messages before panic shows that the guest was unable to mount root file system as below. "Volume group Volgroup00 not found mount: could not find filesystem /dev/root setuproot: moving /dev failed: no such file or directory setuproot: mounting /proc failed: no such file or directory setuproot: mounting /sys failed: no such file or directory switchroot: mount failed: no such file or directory kernle panic- not syncing: Attempted to kill init " I see similar messages for 64 bit guest as well. This is the information from within guest: [r...@ichigo-dom101 ~]# mount /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/hda1 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) [r...@ichigo-dom101 ~]# fdisk -l Disk /dev/hda: 10.7 GB, 10737418240 bytes 255 heads, 63 sectors/track, 1305 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 13 104391 83 Linux /dev/hda2 14130510377990 8e Linux LVM [r...@ichigo-dom101 ~]# cat /boot/grub/device.map # this device map was generated by anaconda (hd0) /dev/vda [r...@ichigo-dom101 ~]# cat /boot/grub/menu.lst # grub.conf generated by anaconda # # Note that you do not have to rerun grub after making changes to this file # NOTICE: You have a /boot partition. This means that # all kernel and initrd paths are relative to /boot/, eg. # root (hd0,0) # kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00 # initrd /initrd-version.img #boot=/dev/hda default=0 timeout=5 splashimage=(hd0,0)/grub/splash.xpm.gz hiddenmenu title Red Hat Enterprise Linux Server (2.6.18-128.el5PAE) root (hd0,0) kernel /vmlinuz-2.6.18-128.el5PAE ro root=/dev/VolGroup00/LogVol00 rhgb quiet initrd /initrd-2.6.18-128.el5PAE.img I ensured that the guest has the virtio_blk.ko module. rpm -qa | grep kvm: qemu-kvm-0.10.5-6 Guest OS: RHEL 5.3 32/64 Host OS: SlES11 64 Guest OS Image storage type: nfs Is there anything that I am not doing in correct way? Please let me know any other information is required. Thanks Sudhir -- Sudhir Kumar -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups
On Tue, 23 Jun 2009, Randy Dunlap wrote: > kernel-doc syntax requires the function name + short description on one line, > followed by parameters. Any longer function description then comes after > the parameters. > > See Documentation/kernel-doc-nano-HOWTO.txt for more info, > or ask me. Will fix, thx! - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/8] kvm/mmu: make direct mapping paths aware of mapping levels
On Tue, Jun 23, 2009 at 01:47:28PM -0300, Marcelo Tosatti wrote: > On Fri, Jun 19, 2009 at 03:16:26PM +0200, Joerg Roedel wrote: > > @@ -254,7 +254,7 @@ static int is_last_spte(u64 pte, int level) > > { > > if (level == PT_PAGE_TABLE_LEVEL) > > return 1; > > - if (level == PT_DIRECTORY_LEVEL && is_large_pte(pte)) > > + if (is_large_pte(pte)) > > return 1; > > Wouldnt it be safer to check for bit 7 only on the levels > we're sure it means large page? If we are not on PT_PAGE_TABLE_LEVEL and this bit does not mean "large page" then bit 7 is MBZ and harware would fault. So it should be safe to just check for bit 7 here. > > kvm_get_pfn(pfn); > > mmu_set_spte(vcpu, spte, page->role.access, pte_access, 0, 0, > > -gpte & PT_DIRTY_MASK, NULL, largepage, > > +gpte & PT_DIRTY_MASK, NULL, level, > > gpte_to_gfn(gpte), pfn, true); > > It would be better to just turn off updates to large sptes via > update_pte path, so just nuke them and let the pagefault path > handle. Yeah true. Thats in the shadow paging patch. Maybe I can get it into a state to be postable again ;-) Thanks, Joerg -- | Advanced Micro Devices GmbH Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei MĂ¼nchen System| Research | GeschäftsfĂ¼hrer: Thomas M. McCoy, Giuliano Meroni Center| Sitz: Dornach, Gemeinde Aschheim, Landkreis MĂ¼nchen | Registergericht MĂ¼nchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups
On Tue, 23 Jun 2009, Avi Kivity wrote: > On 06/23/2009 07:47 PM, Davide Libenzi wrote: > > The following patch changes the eventfd interface to de-couple the eventfd > > memory context, from the file pointer instance. > > Without such change, there is no clean way to racely free handle the > > POLLHUP event sent when the last instance of the file* goes away. > > Also, now the internal eventfd APIs are using the eventfd context instead > > of the file*. > > Another cleanup this patch does, is making AIO select EVENTFD, instead of > > adding a bunch of empty function stubs inside eventfd.h. > > > > Andrew, this better go via Avi and the KVM tree, since they have patches > > that will be based on the new interface. > > > > The kvm patches will only be ready for 2.6.32. Can this go in 2.6.31 now, and > we'll meet in 10 weeks? Either ways works for me. Meanwhile, I'll repost with the revised documentation format. - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] QEMU bug tracker on Launchpad
On Tue, 2009-06-23 at 20:02 +0300, Blue Swirl wrote: > On 6/23/09, Anthony Liguori wrote: > > Dustin Kirkland was kind enough to setup a bug tracker for QEMU on > > Launchpad. I would like to make this the official QEMU bug tracker unless > > there is significant objection. > > The links on code tab do not show which is our tree and there are some > Ubuntu trees: > https://code.launchpad.net/qemu > > Can that be fixed? This 'code' tab does not yet support git. The trees you're seeing are bzr. I don't think Anthony has any intention of using the bzr functions of Launchpad, and the qemu git tree will remain at git://git.savannah.nongnu.org/qemu.git. For convenience, I have setup a periodic sync of the latest git (using git fast-export | bzr fast-import -). I'll move the Ubuntu trees elsewhere, sorry for the noise. :-Dustin signature.asc Description: This is a digitally signed message part
Re: [patch] eventfd - revised interface and cleanups
On 06/23/2009 07:47 PM, Davide Libenzi wrote: The following patch changes the eventfd interface to de-couple the eventfd memory context, from the file pointer instance. Without such change, there is no clean way to racely free handle the POLLHUP event sent when the last instance of the file* goes away. Also, now the internal eventfd APIs are using the eventfd context instead of the file*. Another cleanup this patch does, is making AIO select EVENTFD, instead of adding a bunch of empty function stubs inside eventfd.h. Andrew, this better go via Avi and the KVM tree, since they have patches that will be based on the new interface. The kvm patches will only be ready for 2.6.32. Can this go in 2.6.31 now, and we'll meet in 10 weeks? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] QEMU bug tracker on Launchpad
On 6/23/09, Anthony Liguori wrote: > Dustin Kirkland was kind enough to setup a bug tracker for QEMU on > Launchpad. I would like to make this the official QEMU bug tracker unless > there is significant objection. The links on code tab do not show which is our tree and there are some Ubuntu trees: https://code.launchpad.net/qemu Can that be fixed? > There are a number of QEMU/KVM bug trackers mostly distribution centric > today. Having a common upstream bug tracker will help immensely in > coordinating bug fixing effort for those people who are interested in that > sort of thing :-) > > Here are some of the requirements I had for a bug tracker in no particular > order: > > 1) minimal work required for the QEMU maintainers > 2) ability to link bugs to external bug trackers > 3) ability to control bug status via mails > 4) API for scripting > > The biggest issue for me with Launchpad was that it is not open source > today. Canonical is actively working to release the source code though and > have scheduled a date for release this July. See the link below for more > information. > > I've already begun using this bug tracker so you can look through it today > to get a feeling for what it's like. > > https://bugs.launchpad.net/qemu > https://bugs.launchpad.net/qemu/+bugs > https://dev.launchpad.net/OpenSourcing > > -- > Regards, > > Anthony Liguori > > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] eventfd - revised interface and cleanups
Davide Libenzi wrote: > The following patch changes the eventfd interface to de-couple the eventfd > memory context, from the file pointer instance. > Without such change, there is no clean way to racely free handle the > POLLHUP event sent when the last instance of the file* goes away. > Also, now the internal eventfd APIs are using the eventfd context instead > of the file*. > Another cleanup this patch does, is making AIO select EVENTFD, instead of > adding a bunch of empty function stubs inside eventfd.h. > > Andrew, this better go via Avi and the KVM tree, since they have patches > that will be based on the new interface. > > > > Signed-off-by: Davide Libenzi > > > - Davide > > > --- > drivers/lguest/lg.h |2 > drivers/lguest/lguest_user.c |4 - > fs/aio.c | 24 ++ > fs/eventfd.c | 101 > ++- > include/linux/aio.h |4 - > include/linux/eventfd.h | 17 ++- > init/Kconfig |1 > 7 files changed, 108 insertions(+), 45 deletions(-) > > Index: linux-2.6.mod/fs/eventfd.c > === > --- linux-2.6.mod.orig/fs/eventfd.c 2009-06-21 16:54:15.0 -0700 > +++ linux-2.6.mod/fs/eventfd.c2009-06-23 09:34:42.0 -0700 > @@ -17,32 +17,38 @@ > -/* > - * Adds "n" to the eventfd counter "count". Returns "n" in case of > - * success, or a value lower then "n" in case of coutner overflow. > - * This function is supposed to be called by the kernel in paths > - * that do not allow sleeping. In this function we allow the counter > - * to reach the ULLONG_MAX value, and we signal this as overflow > - * condition by returining a POLLERR to poll(2). > +/** > + * eventfd_signal - Adds @n to the eventfd counter. This function is > + * supposed to be called by the kernel in paths that do not > + * allow sleeping. In this function we allow the counter > + * to reach the ULLONG_MAX value, and we signal this as > + * overflow condition by returining a POLLERR to poll(2). > + * kernel-doc syntax requires the function name + short description on one line, followed by parameters. Any longer function description then comes after the parameters. See Documentation/kernel-doc-nano-HOWTO.txt for more info, or ask me. > + * @ctx: [in] Pointer to the eventfd context. > + * @n: [in] Value of the counter to be added to the eventfd internal counter. > + * > + * Returns: In case of success, it returns @n, otherwise (in case of overflow > + * of the eventfd 64bit internal counter) a value lower than @n. > */ > -int eventfd_signal(struct file *file, int n) > +int eventfd_signal(struct eventfd_ctx *ctx, int n) > { > - struct eventfd_ctx *ctx = file->private_data; > unsigned long flags; > > if (n < 0) > +/** > + * eventfd_ctx_put - Releases a reference to the internal eventfd context > + * (previously acquired either with eventfd_ctx_get() or > + * eventfd_ctx_fdget()). > + * > + * @ctx: [in] Pointer to eventfd context. > + * > + * Returns: Nothing. > + */ > +void eventfd_ctx_put(struct eventfd_ctx *ctx) > +{ > + kref_put(&ctx->kref, eventfd_free); > +} > +EXPORT_SYMBOL_GPL(eventfd_ctx_put); > + > static int eventfd_release(struct inode *inode, struct file *file) > { > - kfree(file->private_data); > + struct eventfd_ctx *ctx = file->private_data; > + > + wake_up_poll(&ctx->wqh, POLLHUP); > + eventfd_ctx_put(ctx); > return 0; > } > > +/** > + * eventfd_ctx_fdget - Acquires a reference to the internal eventfd context > + * given an eventfd file descriptor. > + * > + * @fd: [in] Eventfd file descriptor. > + * > + * Returns: In case of success, it returns a pointer to the internal eventfd > + * context, otherwise a proper error code. > + */ > +struct eventfd_ctx *eventfd_ctx_fdget(int fd) > +{ > + struct file *file; > + struct eventfd_ctx *ctx; > + > + file = eventfd_fget(fd); > + if (IS_ERR(file)) > + return (struct eventfd_ctx *) file; > + ctx = eventfd_ctx_get(file->private_data); > + fput(file); > + > + return ctx; > +} > +EXPORT_SYMBOL_GPL(eventfd_ctx_fdget); -- ~Randy LPC 2009, Sept. 23-25, Portland, Oregon http://linuxplumbersconf.org/2009/ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] kvm/mmu: rename is_largepage_backed to mapping_level
On Tue, Jun 23, 2009 at 12:59:33PM -0300, Marcelo Tosatti wrote: > Hi Joerg, > > On Fri, Jun 19, 2009 at 03:16:24PM +0200, Joerg Roedel wrote: > > gfn = unalias_gfn(kvm, gfn); > > - write_count = slot_largepage_idx(gfn, > > -gfn_to_memslot_unaliased(kvm, gfn)); > > - *write_count += 1; > > + > > + for (i = PT_DIRECTORY_LEVEL; > > +i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) { > > + slot = gfn_to_memslot_unaliased(kvm, gfn); > > Can't you move this call out of the loop? True. Will do this. > > @@ -1704,7 +1739,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, > > if ((pte_access & ACC_WRITE_MASK) > > || (write_fault && !is_write_protection(vcpu) && !user_fault)) { > > > > - if (largepage && has_wrprotected_page(vcpu->kvm, gfn)) { > > + if (largepage && has_wrprotected_page(vcpu->kvm, gfn, 1)) { > > It seems direct_map is missing the large pte overwrite check that > fetch() contains: > > if (is_large_pte(*sptep)) { > rmap_remove(vcpu->kvm, sptep); > __set_spte(sptep, shadow_trap_nonpresent_pte); > kvm_flush_remote_tlbs(vcpu->kvm); > } > > (perhaps its not a possible scenario at the moment, but...). This function is only called from mmu_set_spte which takes care of this. Thanks, Joerg -- | Advanced Micro Devices GmbH Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei MĂ¼nchen System| Research | GeschäftsfĂ¼hrer: Thomas M. McCoy, Giuliano Meroni Center| Sitz: Dornach, Gemeinde Aschheim, Landkreis MĂ¼nchen | Registergericht MĂ¼nchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] eventfd - revised interface and cleanups
The following patch changes the eventfd interface to de-couple the eventfd memory context, from the file pointer instance. Without such change, there is no clean way to racely free handle the POLLHUP event sent when the last instance of the file* goes away. Also, now the internal eventfd APIs are using the eventfd context instead of the file*. Another cleanup this patch does, is making AIO select EVENTFD, instead of adding a bunch of empty function stubs inside eventfd.h. Andrew, this better go via Avi and the KVM tree, since they have patches that will be based on the new interface. Signed-off-by: Davide Libenzi - Davide --- drivers/lguest/lg.h |2 drivers/lguest/lguest_user.c |4 - fs/aio.c | 24 ++ fs/eventfd.c | 101 ++- include/linux/aio.h |4 - include/linux/eventfd.h | 17 ++- init/Kconfig |1 7 files changed, 108 insertions(+), 45 deletions(-) Index: linux-2.6.mod/fs/eventfd.c === --- linux-2.6.mod.orig/fs/eventfd.c 2009-06-21 16:54:15.0 -0700 +++ linux-2.6.mod/fs/eventfd.c 2009-06-23 09:34:42.0 -0700 @@ -17,32 +17,38 @@ #include #include #include +#include struct eventfd_ctx { + struct kref kref; wait_queue_head_t wqh; /* * Every time that a write(2) is performed on an eventfd, the * value of the __u64 being written is added to "count" and a * wakeup is performed on "wqh". A read(2) will return the "count" * value to userspace, and will reset "count" to zero. The kernel -* size eventfd_signal() also, adds to the "count" counter and +* side eventfd_signal() also, adds to the "count" counter and * issue a wakeup. */ __u64 count; unsigned int flags; }; -/* - * Adds "n" to the eventfd counter "count". Returns "n" in case of - * success, or a value lower then "n" in case of coutner overflow. - * This function is supposed to be called by the kernel in paths - * that do not allow sleeping. In this function we allow the counter - * to reach the ULLONG_MAX value, and we signal this as overflow - * condition by returining a POLLERR to poll(2). +/** + * eventfd_signal - Adds @n to the eventfd counter. This function is + * supposed to be called by the kernel in paths that do not + * allow sleeping. In this function we allow the counter + * to reach the ULLONG_MAX value, and we signal this as + * overflow condition by returining a POLLERR to poll(2). + * + * @ctx: [in] Pointer to the eventfd context. + * @n: [in] Value of the counter to be added to the eventfd internal counter. + * + * Returns: In case of success, it returns @n, otherwise (in case of overflow + * of the eventfd 64bit internal counter) a value lower than @n. */ -int eventfd_signal(struct file *file, int n) +int eventfd_signal(struct eventfd_ctx *ctx, int n) { - struct eventfd_ctx *ctx = file->private_data; unsigned long flags; if (n < 0) @@ -59,9 +65,49 @@ int eventfd_signal(struct file *file, in } EXPORT_SYMBOL_GPL(eventfd_signal); +static void eventfd_free(struct kref *kref) +{ + struct eventfd_ctx *ctx = container_of(kref, struct eventfd_ctx, kref); + + kfree(ctx); +} + +/** + * eventfd_ctx_get - Acquires a reference to the internal eventfd context. + * + * @ctx: [in] Pointer to the eventfd context. + * + * Returns: In case of success, returns a pointer to the eventfd context, + * otherwise a proper error code. + */ +struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx) +{ + kref_get(&ctx->kref); + return ctx; +} +EXPORT_SYMBOL_GPL(eventfd_ctx_get); + +/** + * eventfd_ctx_put - Releases a reference to the internal eventfd context + * (previously acquired either with eventfd_ctx_get() or + * eventfd_ctx_fdget()). + * + * @ctx: [in] Pointer to eventfd context. + * + * Returns: Nothing. + */ +void eventfd_ctx_put(struct eventfd_ctx *ctx) +{ + kref_put(&ctx->kref, eventfd_free); +} +EXPORT_SYMBOL_GPL(eventfd_ctx_put); + static int eventfd_release(struct inode *inode, struct file *file) { - kfree(file->private_data); + struct eventfd_ctx *ctx = file->private_data; + + wake_up_poll(&ctx->wqh, POLLHUP); + eventfd_ctx_put(ctx); return 0; } @@ -185,6 +231,14 @@ static const struct file_operations even .write = eventfd_write, }; +/** + * eventfd_fget - Acquire a reference of an eventfd file descriptor. + * + * @fd: [in] Eventfd file descriptor. + * + * Returns: A pointer to the eventfd file structure in case of success, or a + * proper error pointer in case of failure. + */ struct file *eventfd_fget(int fd) { struct file *fil
Re: [PATCH 5/8] kvm/mmu: make direct mapping paths aware of mapping levels
On Fri, Jun 19, 2009 at 03:16:26PM +0200, Joerg Roedel wrote: > Signed-off-by: Joerg Roedel > --- > arch/x86/include/asm/kvm_host.h |2 +- > arch/x86/kvm/mmu.c | 60 > ++- > arch/x86/kvm/paging_tmpl.h |6 ++-- > 3 files changed, 38 insertions(+), 30 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 930bac2..397f992 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -314,7 +314,7 @@ struct kvm_vcpu_arch { > struct { > gfn_t gfn; /* presumed gfn during guest pte update */ > pfn_t pfn; /* pfn corresponding to that gfn */ > - int largepage; > + int level; > unsigned long mmu_seq; > } update_pte; > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 0ef947d..ef2396d 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -254,7 +254,7 @@ static int is_last_spte(u64 pte, int level) > { > if (level == PT_PAGE_TABLE_LEVEL) > return 1; > - if (level == PT_DIRECTORY_LEVEL && is_large_pte(pte)) > + if (is_large_pte(pte)) > return 1; Wouldnt it be safer to check for bit 7 only on the levels we're sure it means large page? > return 0; > } > @@ -1708,7 +1708,7 @@ static int mmu_need_write_protect(struct kvm_vcpu > *vcpu, gfn_t gfn, > > static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, > unsigned pte_access, int user_fault, > - int write_fault, int dirty, int largepage, > + int write_fault, int dirty, int level, > gfn_t gfn, pfn_t pfn, bool speculative, > bool can_unsync) > { > @@ -1731,7 +1731,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, > spte |= shadow_nx_mask; > if (pte_access & ACC_USER_MASK) > spte |= shadow_user_mask; > - if (largepage) > + if (level > PT_PAGE_TABLE_LEVEL) > spte |= PT_PAGE_SIZE_MASK; > if (tdp_enabled) > spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn, > @@ -1742,7 +1742,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, > if ((pte_access & ACC_WRITE_MASK) > || (write_fault && !is_write_protection(vcpu) && !user_fault)) { > > - if (largepage && has_wrprotected_page(vcpu->kvm, gfn, 1)) { > + if (level > PT_PAGE_TABLE_LEVEL && > + has_wrprotected_page(vcpu->kvm, gfn, level)) { > ret = 1; > spte = shadow_trap_nonpresent_pte; > goto set_pte; > @@ -1780,7 +1781,7 @@ set_pte: > static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, >unsigned pt_access, unsigned pte_access, >int user_fault, int write_fault, int dirty, > - int *ptwrite, int largepage, gfn_t gfn, > + int *ptwrite, int level, gfn_t gfn, >pfn_t pfn, bool speculative) > { > int was_rmapped = 0; > @@ -1796,7 +1797,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 > *sptep, >* If we overwrite a PTE page pointer with a 2MB PMD, unlink >* the parent of the now unreachable PTE. >*/ > - if (largepage && !is_large_pte(*sptep)) { > + if (level > PT_PAGE_TABLE_LEVEL && > + !is_large_pte(*sptep)) { Should probably get rid of this entirely (or BUG_ON), since a better place to handle this is at fetch / direct_map as mentioned in the other email. But we can do that later, incrementally. > struct kvm_mmu_page *child; > u64 pte = *sptep; > > @@ -1809,8 +1811,9 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 > *sptep, > } else > was_rmapped = 1; > } > + > if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault, > - dirty, largepage, gfn, pfn, speculative, true)) { > + dirty, level, gfn, pfn, speculative, true)) { > if (write_fault) > *ptwrite = 1; > kvm_x86_ops->tlb_flush(vcpu); > @@ -1846,7 +1849,7 @@ static void nonpaging_new_cr3(struct kvm_vcpu *vcpu) > } > > static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, > - int largepage, gfn_t gfn, pfn_t pfn) > + int level, gfn_t gfn, pfn_t pfn) > { > struct kvm_shadow_walk_iterator iterator; > struct kvm_mmu_page *sp; > @@ -1854,11 +1857,10 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t > v, int write, > gfn_t pseudo_gfn; > > for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) { > - if (iterator.level == PT_PAGE_TABLE_LEVEL > - || (largepag
Re: [PATCH 2/8] kvm: change memslot data structures for multiple hugepage sizes
On Fri, Jun 19, 2009 at 03:16:23PM +0200, Joerg Roedel wrote: > /* > @@ -724,11 +724,11 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned > long hva, > end = start + (memslot->npages << PAGE_SHIFT); > if (hva >= start && hva < end) { > gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT; > + int idx = gfn_offset / > + KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL); > retval |= handler(kvm, &memslot->rmap[gfn_offset]); > retval |= handler(kvm, > - &memslot->lpage_info[ > - gfn_offset / > - > KVM_PAGES_PER_HPAGE].rmap_pde); > + &memslot->lpage_info[0][idx].rmap_pde); > } Cannot find where you update this function to reflect 1GB pages in the series? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] kvm/mmu: rename is_largepage_backed to mapping_level
Hi Joerg, On Fri, Jun 19, 2009 at 03:16:24PM +0200, Joerg Roedel wrote: > With the new name and the corresponding backend changes this function > can now support multiple hugepage sizes. > > Signed-off-by: Joerg Roedel > --- > arch/x86/kvm/mmu.c | 100 +-- > arch/x86/kvm/paging_tmpl.h |4 +- > 2 files changed, 69 insertions(+), 35 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 1f24d88..3fa6009 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -390,37 +390,52 @@ static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd) > * Return the pointer to the largepage write count for a given > * gfn, handling slots that are not large page aligned. > */ > -static int *slot_largepage_idx(gfn_t gfn, struct kvm_memory_slot *slot) > +static int *slot_largepage_idx(gfn_t gfn, > +struct kvm_memory_slot *slot, > +int level) > { > unsigned long idx; > > - idx = (gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL)) - > - (slot->base_gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL)); > - return &slot->lpage_info[0][idx].write_count; > + idx = (gfn / KVM_PAGES_PER_HPAGE(level)) - > + (slot->base_gfn / KVM_PAGES_PER_HPAGE(level)); > + return &slot->lpage_info[level - 2][idx].write_count; > } > > static void account_shadowed(struct kvm *kvm, gfn_t gfn) > { > + struct kvm_memory_slot *slot; > int *write_count; > + int i; > > gfn = unalias_gfn(kvm, gfn); > - write_count = slot_largepage_idx(gfn, > - gfn_to_memslot_unaliased(kvm, gfn)); > - *write_count += 1; > + > + for (i = PT_DIRECTORY_LEVEL; > + i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) { > + slot = gfn_to_memslot_unaliased(kvm, gfn); Can't you move this call out of the loop? > + write_count = slot_largepage_idx(gfn, slot, i); > + *write_count += 1; > + } > } > > static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn) > { > + struct kvm_memory_slot *slot; > int *write_count; > + int i; > > gfn = unalias_gfn(kvm, gfn); > - write_count = slot_largepage_idx(gfn, > - gfn_to_memslot_unaliased(kvm, gfn)); > - *write_count -= 1; > - WARN_ON(*write_count < 0); > + for (i = PT_DIRECTORY_LEVEL; > + i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) { > + slot = gfn_to_memslot_unaliased(kvm, gfn); > + write_count = slot_largepage_idx(gfn, slot, i); > + *write_count -= 1; > + WARN_ON(*write_count < 0); > + } > } > > -static int has_wrprotected_page(struct kvm *kvm, gfn_t gfn) > +static int has_wrprotected_page(struct kvm *kvm, > + gfn_t gfn, > + int level) > { > struct kvm_memory_slot *slot; > int *largepage_idx; > @@ -428,47 +443,67 @@ static int has_wrprotected_page(struct kvm *kvm, gfn_t > gfn) > gfn = unalias_gfn(kvm, gfn); > slot = gfn_to_memslot_unaliased(kvm, gfn); > if (slot) { > - largepage_idx = slot_largepage_idx(gfn, slot); > + largepage_idx = slot_largepage_idx(gfn, slot, level); > return *largepage_idx; > } > > return 1; > } > > -static int host_largepage_backed(struct kvm *kvm, gfn_t gfn) > +static int host_mapping_level(struct kvm *kvm, gfn_t gfn) > { > + unsigned long page_size = PAGE_SIZE; > struct vm_area_struct *vma; > unsigned long addr; > - int ret = 0; > + int i, ret = 0; > > addr = gfn_to_hva(kvm, gfn); > if (kvm_is_error_hva(addr)) > - return ret; > + return page_size; > > down_read(¤t->mm->mmap_sem); > vma = find_vma(current->mm, addr); > - if (vma && is_vm_hugetlb_page(vma)) > - ret = 1; > + if (!vma) > + goto out; > + > + page_size = vma_kernel_pagesize(vma); > + > +out: > up_read(¤t->mm->mmap_sem); > > + for (i = PT_PAGE_TABLE_LEVEL; > + i < (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES); ++i) { > + if (page_size >= KVM_HPAGE_SIZE(i)) > + ret = i; > + else > + break; > + } > + > return ret; > } > > -static int is_largepage_backed(struct kvm_vcpu *vcpu, gfn_t large_gfn) > +static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn) > { > struct kvm_memory_slot *slot; > - > - if (has_wrprotected_page(vcpu->kvm, large_gfn)) > - return 0; > - > - if (!host_largepage_backed(vcpu->kvm, large_gfn)) > - return 0; > + int host_level; > + int level = PT_PAGE_TABLE_LEVEL; > > slot = gfn_to_memslot(vcpu->kvm, large_gfn); > if (slot && slot->dirty_bitmap) > - re
Re: [PATCH] kvm: remove in_range from kvm_io_device
Michael S. Tsirkin wrote: > On Tue, Jun 23, 2009 at 11:44:57AM -0400, Gregory Haskins wrote: > This proposed approach forces us into a potential O(256) algorithm in the hotpath (all MMIO/PIO exits will hit this, not just in-kernel users). How would you address this? >>> Two ideas that come to mind: >>> - add addr/len fields to devices, use these to speed up lookup >>> >>> >> Yep, thats what I was thinking as well. We can have the top-level >> (group) be an rbtree on addr/len, and then walk the list of items at >> that address linearly using your read/write() approach. >> >> >> >>> - add a small cache that can be scanned first >>> >>> >> Yep, I think we may want to do this anyway independent of the search alg. >> >> >>> In both cases, you first do a fast lookup, ask the device whether >>> it wants the transaction, then resort to linear scan if not >>> >>> >> -Greg >> >> > > Looks like we have a concensus then. > > Yep. I'll try to go over the patch in detail today and provide any more feedback. Thanks Michael, -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH] kvm: remove in_range from kvm_io_device
On Tue, Jun 23, 2009 at 11:44:57AM -0400, Gregory Haskins wrote: > >> This proposed approach forces us into a > >> potential O(256) algorithm in the hotpath (all MMIO/PIO exits will hit > >> this, not just in-kernel users). How would you address this? > >> > > > > Two ideas that come to mind: > > - add addr/len fields to devices, use these to speed up lookup > > > > Yep, thats what I was thinking as well. We can have the top-level > (group) be an rbtree on addr/len, and then walk the list of items at > that address linearly using your read/write() approach. > > > > - add a small cache that can be scanned first > > > > Yep, I think we may want to do this anyway independent of the search alg. > > > In both cases, you first do a fast lookup, ask the device whether > > it wants the transaction, then resort to linear scan if not > > > > -Greg > Looks like we have a concensus then. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Planning for the 0.11.0 release
On 06/23/2009 06:09 PM, Blue Swirl wrote: I think this is great, but OpenBIOS still uses Subversion. Can git use SVN submodules for example? No, but we could have a git svn mirror and include that. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Planning for the 0.11.0 release
Blue Swirl wrote: On 6/23/09, Anthony Liguori wrote: Hi, It's getting to be about the time to start thinking about the 0.11.0 release. 0.10.0 was released on March 2nd so following with the 6 month release cycle, that would put 0.11.0 at September 2nd. Based on the experiences with the stable releases, here's what I'd recommend: o On July 15th, fork master -> stable-0.11 o Change version to 0.10.90 o Release qemu-0.11.0-rc1 o Release additional -rcN releases every 1-2 weeks o Introduce a new maintainer for stable-0.10 (via git pulls) o At least 1 week before release, hopefully we'll have the final -rcN that we can then declare 0.11.0. Sounds OK. I think OpenBIOS releases should follow similar schedule, maybe even with matching SVN tags (1.1-rc1 for 0.11.0-rc1 etc). I think we should really try hard to make these dates. I only have a few things that I would like to see happen before forking stable-0.11. Namely: o Setup qemu.org infrastructure (git hosting, wiki) o Setup qemu bug tracker (see next mail) o Include all ROM source code in tree via git submodules. This is a major headache for distributors and I think it's important to resolve before our next release. I think this is great, but OpenBIOS still uses Subversion. Can git use SVN submodules for example? No, but it's not too hard to make svn->git bridges. Regards, Anthony Liguori -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: remove in_range from kvm_io_device
Michael S. Tsirkin wrote: > On Tue, Jun 23, 2009 at 11:21:53AM -0400, Gregory Haskins wrote: > >> Michael S. Tsirkin wrote: >> >>> Remove in_range from kvm_io_device and ask read/write callbacks, if >>> supplied, to perform range checks internally. This allows aliasing >>> (mostly for in-kernel virtio), as well as better error handling by >>> making it possible to pass errors up to userspace. And it's enough to >>> look at the diffstat to see that it's a better API anyway. >>> >>> While we are at it, document locking rules for kvm_io_device. >>> >>> >> Sorry, not trying to be a PITA, but I liked your last suggestion better. :( >> >> I am thinking forward to when we want to use something smarter than a >> linear search (like rbtree/radix) for scaling the number of "devices" >> (really, virtio-rings) that we support. >> > > in_range is broken for this anyway: you need more than a boolean > predicate to implement rbtree/radix > Yes, understood..in_range() needs to be (pardon the pun) "addressed" ;). But getting rid of in_range() and moving the match logic into the read()/write() verbs is potentially a step in the wrong direction if we ever wanted to go that route. And I'm pretty sure we do. > >> The current device-count >> target is 512, which we will begin to rapidly consume as the in-kernel >> virtio work progresses. >> > > That's a large number. I had in mind more like 4 virtio devices, for > starters: 1 for each virtqueue in net and block. > Thats way to low. For instance, I'll be wanting to do things like 802.1p which would be 16 virtio-rings per device (8 prio levels tx, 8 levels rx). And thats just for one device. I think Avi came up with an estimate of supporting 20 devices @ 16 queues = 320, so we rounded it to 512. > >> This proposed approach forces us into a >> potential O(256) algorithm in the hotpath (all MMIO/PIO exits will hit >> this, not just in-kernel users). How would you address this? >> > > Two ideas that come to mind: > - add addr/len fields to devices, use these to speed up lookup > Yep, thats what I was thinking as well. We can have the top-level (group) be an rbtree on addr/len, and then walk the list of items at that address linearly using your read/write() approach. > - add a small cache that can be scanned first > Yep, I think we may want to do this anyway independent of the search alg. > In both cases, you first do a fast lookup, ask the device whether > it wants the transaction, then resort to linear scan if not > -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH] kvm: remove in_range from kvm_io_device
On Tue, Jun 23, 2009 at 11:21:53AM -0400, Gregory Haskins wrote: > Michael S. Tsirkin wrote: > > Remove in_range from kvm_io_device and ask read/write callbacks, if > > supplied, to perform range checks internally. This allows aliasing > > (mostly for in-kernel virtio), as well as better error handling by > > making it possible to pass errors up to userspace. And it's enough to > > look at the diffstat to see that it's a better API anyway. > > > > While we are at it, document locking rules for kvm_io_device. > > > > Sorry, not trying to be a PITA, but I liked your last suggestion better. :( > > I am thinking forward to when we want to use something smarter than a > linear search (like rbtree/radix) for scaling the number of "devices" > (really, virtio-rings) that we support. in_range is broken for this anyway: you need more than a boolean predicate to implement rbtree/radix > The current device-count > target is 512, which we will begin to rapidly consume as the in-kernel > virtio work progresses. That's a large number. I had in mind more like 4 virtio devices, for starters: 1 for each virtqueue in net and block. > This proposed approach forces us into a > potential O(256) algorithm in the hotpath (all MMIO/PIO exits will hit > this, not just in-kernel users). How would you address this? Two ideas that come to mind: - add addr/len fields to devices, use these to speed up lookup - add a small cache that can be scanned first In both cases, you first do a fast lookup, ask the device whether it wants the transaction, then resort to linear scan if not -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/3] fixes for kvm on s390
Hello Avi, here are three patches against kvm.git that fix several issues in our kvm port. Please review and consider all patches for 2.6.31. Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 3/3] kvm-s390: Remove some unused variables
From: Christian Borntraeger This patch fixes the following warnings that were introduced by commit 2921292f45733bccdb53e426bcf65ceb13f53d94 Author: Gleb Natapov KVM: Use macro to iterate over vcpus. arch/s390/kvm/kvm-s390.c: In function 'kvm_arch_set_memory_region': arch/s390/kvm/kvm-s390.c:687: warning: unused variable 'r' arch/s390/kvm/kvm-s390.c:687: warning: unused variable 'j' CC: Gleb Natapov Signed-off-by: Christian Borntraeger --- arch/s390/kvm/kvm-s390.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: kvm/arch/s390/kvm/kvm-s390.c === --- kvm.orig/arch/s390/kvm/kvm-s390.c +++ kvm/arch/s390/kvm/kvm-s390.c @@ -684,7 +684,7 @@ int kvm_arch_set_memory_region(struct kv struct kvm_memory_slot old, int user_alloc) { - int i, j = 0, r = -EINVAL; + int i; struct kvm_vcpu *vcpu; /* A few sanity checks. We can have exactly one memory slot which has -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/3] kvm-s390: Allow stfle instruction in the guest
From: Christian Borntraeger 2.6.31-rc introduced an architecture level set checker based on facility bits. e.g. if the kernel is compiled to run only on z9, several facility bits are checked very early and the kernel refuses to boot if a z9 specific facility is missing. Until now kvm on s390 did not implement the store facility extended (STFLE) instruction. A 2.6.31-rc kernel that was compiled for z9 or higher did not boot in kvm. This patch implements stfle. This patch should go in before 2.6.31. Signed-off-by: Christian Borntraeger --- arch/s390/include/asm/kvm_host.h |4 +++- arch/s390/kvm/kvm-s390.c | 23 ++- arch/s390/kvm/priv.c |2 +- 3 files changed, 26 insertions(+), 3 deletions(-) Index: kvm/arch/s390/include/asm/kvm_host.h === --- kvm.orig/arch/s390/include/asm/kvm_host.h +++ kvm/arch/s390/include/asm/kvm_host.h @@ -99,7 +99,9 @@ struct kvm_s390_sie_block { __u8reservedd0[48]; /* 0x00d0 */ __u64 gcr[16];/* 0x0100 */ __u64 gbea; /* 0x0180 */ - __u8reserved188[120]; /* 0x0188 */ + __u8reserved188[24];/* 0x0188 */ + __u32 fac;/* 0x01a0 */ + __u8reserved1a4[92];/* 0x01a4 */ } __attribute__((packed)); struct kvm_vcpu_stat { Index: kvm/arch/s390/kvm/kvm-s390.c === --- kvm.orig/arch/s390/kvm/kvm-s390.c +++ kvm/arch/s390/kvm/kvm-s390.c @@ -26,6 +26,7 @@ #include #include #include +#include #include "kvm-s390.h" #include "gaccess.h" @@ -70,6 +71,7 @@ struct kvm_stats_debugfs_item debugfs_en { NULL } }; +static unsigned long long *facilities; /* Section: not file related */ void kvm_arch_hardware_enable(void *garbage) @@ -287,6 +289,7 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu set_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests); vcpu->arch.sie_block->ecb = 2; vcpu->arch.sie_block->eca = 0xC1002001U; + vcpu->arch.sie_block->fac = (int) (long) facilities; hrtimer_init(&vcpu->arch.ckc_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS); tasklet_init(&vcpu->arch.tasklet, kvm_s390_tasklet, (unsigned long) vcpu); @@ -727,11 +730,29 @@ gfn_t unalias_gfn(struct kvm *kvm, gfn_t static int __init kvm_s390_init(void) { - return kvm_init(NULL, sizeof(struct kvm_vcpu), THIS_MODULE); + int ret; + ret = kvm_init(NULL, sizeof(struct kvm_vcpu), THIS_MODULE); + if (ret) + return ret; + + /* +* guests can ask for up to 255+1 double words, we need a full page +* to hold the maximum amount of facilites. On the other hand, we +* only set facilities that are known to work in KVM. +*/ + facilities = (unsigned long long *) get_zeroed_page(GFP_DMA); + if (!facilities) { + kvm_exit(); + return -ENOMEM; + } + stfle(facilities, 1); + facilities[0] &= 0xff00fff3f070ULL; + return 0; } static void __exit kvm_s390_exit(void) { + free_page((unsigned long) facilities); kvm_exit(); } Index: kvm/arch/s390/kvm/priv.c === --- kvm.orig/arch/s390/kvm/priv.c +++ kvm/arch/s390/kvm/priv.c @@ -158,7 +158,7 @@ static int handle_stfl(struct kvm_vcpu * vcpu->stat.instruction_stfl++; /* only pass the facility bits, which we can handle */ - facility_list &= 0xfe00fff3; + facility_list &= 0xff00fff3; rc = copy_to_guest(vcpu, offsetof(struct _lowcore, stfl_fac_list), &facility_list, sizeof(facility_list)); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/3] kvm-s390: Fix memslot initialization for userspace_addr != 0
From: Christian Borntraeger Since commit 854b5338196b1175706e99d63be43a4f8d8ab607 Author: Christian Ehrhardt KVM: s390: streamline memslot handling s390 uses the values of the memslot instead of doing everything in the arch ioctl handler of the KVM_SET_USER_MEMORY_REGION. Unfortunately we missed to set the userspace_addr of our memslot due to our s390 ifdef in __kvm_set_memory_region. Old s390 userspace launchers did not notice, since they started the guest at userspace address 0. Because of CONFIG_DEFAULT_MMAP_MIN_ADDR we now put the guest at 1M userspace, which does not work. This patch makes sure that new.userspace_addr is set on s390. This fix should go in quickly. Nevertheless, looking at the code we should clean up that ifdef in the long term. Any kernel janitors? Signed-off-by: Christian Borntraeger --- virt/kvm/kvm_main.c |4 1 file changed, 4 insertions(+) Index: kvm/virt/kvm/kvm_main.c === --- kvm.orig/virt/kvm/kvm_main.c +++ kvm/virt/kvm/kvm_main.c @@ -1199,6 +1199,10 @@ int __kvm_set_memory_region(struct kvm * if (old.npages) kvm_arch_flush_shadow(kvm); } +#else /* not defined CONFIG_S390 */ + new.user_alloc = user_alloc; + if (user_alloc) + new.userspace_addr = mem->userspace_addr; #endif /* not defined CONFIG_S390 */ if (!npages) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: remove in_range from kvm_io_device
Michael S. Tsirkin wrote: > Remove in_range from kvm_io_device and ask read/write callbacks, if > supplied, to perform range checks internally. This allows aliasing > (mostly for in-kernel virtio), as well as better error handling by > making it possible to pass errors up to userspace. And it's enough to > look at the diffstat to see that it's a better API anyway. > > While we are at it, document locking rules for kvm_io_device. > Sorry, not trying to be a PITA, but I liked your last suggestion better. :( I am thinking forward to when we want to use something smarter than a linear search (like rbtree/radix) for scaling the number of "devices" (really, virtio-rings) that we support. The current device-count target is 512, which we will begin to rapidly consume as the in-kernel virtio work progresses. This proposed approach forces us into a potential O(256) algorithm in the hotpath (all MMIO/PIO exits will hit this, not just in-kernel users). How would you address this? Regards, -Greg > Signed-off-by: Michael S. Tsirkin > --- > > This is a replacement for my patch extending in_range. > This works for me, but please review carefully. > > arch/ia64/kvm/kvm-ia64.c | 28 - > arch/x86/kvm/i8254.c | 49 -- > arch/x86/kvm/i8259.c | 22 ++ > arch/x86/kvm/lapic.c | 48 ++ > arch/x86/kvm/x86.c| 100 > +++-- > include/linux/kvm_host.h |6 ++- > virt/kvm/coalesced_mmio.c | 26 +--- > virt/kvm/ioapic.c | 24 ++- > virt/kvm/iodev.h | 34 ++- > virt/kvm/kvm_main.c | 25 +++- > 10 files changed, 158 insertions(+), 204 deletions(-) > > diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c > index d20a5db..647ff91 100644 > --- a/arch/ia64/kvm/kvm-ia64.c > +++ b/arch/ia64/kvm/kvm-ia64.c > @@ -198,16 +198,6 @@ int kvm_dev_ioctl_check_extension(long ext) > > } > > -static struct kvm_io_device *vcpu_find_mmio_dev(struct kvm_vcpu *vcpu, > - gpa_t addr, int len, int is_write) > -{ > - struct kvm_io_device *dev; > - > - dev = kvm_io_bus_find_dev(&vcpu->kvm->mmio_bus, addr, len, is_write); > - > - return dev; > -} > - > static int handle_vm_error(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) > { > kvm_run->exit_reason = KVM_EXIT_UNKNOWN; > @@ -219,6 +209,7 @@ static int handle_mmio(struct kvm_vcpu *vcpu, struct > kvm_run *kvm_run) > { > struct kvm_mmio_req *p; > struct kvm_io_device *mmio_dev; > + int r; > > p = kvm_get_vcpu_ioreq(vcpu); > > @@ -235,16 +226,13 @@ static int handle_mmio(struct kvm_vcpu *vcpu, struct > kvm_run *kvm_run) > kvm_run->exit_reason = KVM_EXIT_MMIO; > return 0; > mmio: > - mmio_dev = vcpu_find_mmio_dev(vcpu, p->addr, p->size, !p->dir); > - if (mmio_dev) { > - if (!p->dir) > - kvm_iodevice_write(mmio_dev, p->addr, p->size, > - &p->data); > - else > - kvm_iodevice_read(mmio_dev, p->addr, p->size, > - &p->data); > - > - } else > + if (p->dir) > + r = kvm_io_bus_read(&vcpu->kvm->mmio_bus, p->addr, p->size, > + &p->data); > + else > + r = kvm_io_bus_write(&vcpu->kvm->mmio_bus, p->addr, p->size, > + &p->data); > + if (r) > printk(KERN_ERR"kvm: No iodevice found! addr:%lx\n", p->addr); > p->state = STATE_IORESP_READY; > > diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c > index c13bb92..d5881a4 100644 > --- a/arch/x86/kvm/i8254.c > +++ b/arch/x86/kvm/i8254.c > @@ -336,8 +336,14 @@ void kvm_pit_load_count(struct kvm *kvm, int channel, > u32 val) > mutex_unlock(&kvm->arch.vpit->pit_state.lock); > } > > -static void pit_ioport_write(struct kvm_io_device *this, > - gpa_t addr, int len, const void *data) > +static inline int pit_in_range(gpa_t addr) > +{ > + return ((addr >= KVM_PIT_BASE_ADDRESS) && > + (addr < KVM_PIT_BASE_ADDRESS + KVM_PIT_MEM_LENGTH)); > +} > + > +static int pit_ioport_write(struct kvm_io_device *this, > + gpa_t addr, int len, const void *data) > { > struct kvm_pit *pit = (struct kvm_pit *)this->private; > struct kvm_kpit_state *pit_state = &pit->pit_state; > @@ -345,6 +351,8 @@ static void pit_ioport_write(struct kvm_io_device *this, > int channel, access; > struct kvm_kpit_channel_state *s; > u32 val = *(u32 *) data; > + if (!pit_in_range(addr)) > + return -EOPNOTSUPP; > > val &= 0xff; > addr &= KVM_PIT_CHANNEL_MASK; > @@ -407,16 +415,19 @@ static void pit_ioport_write(struct kvm_io_device *this, > } > > mutex_unlock(&pit_state->loc
Re: [Qemu-devel] [PATCH] virtio-serial: virtio device for simple host <-> guest communication
On 6/23/09, Amit Shah wrote: > This interface presents a char device from which bits can be > sent and read. > +struct virtio_serial_config > +{ > +uint32_t nr_ports; > +uint16_t status; > +} __attribute__((packed)); Obviously this has to match the kernel structure if you go for 16 bit nr_ports. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] virtio_serial: A char device for simple guest <-> host communication
On 6/23/09, Amit Shah wrote: > We expose multiple char devices ("ports") for simple communication > between the host userspace and guest. > +struct virtio_serial_config { > + __u32 nr_ports; > + __u16 status; > +} __attribute__((packed)); There is still structure packing. I'd use __u16 for both fields, do you really need 4 gigs of ports?. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Planning for the 0.11.0 release
On 6/23/09, Anthony Liguori wrote: > Hi, > > It's getting to be about the time to start thinking about the 0.11.0 > release. 0.10.0 was released on March 2nd so following with the 6 month > release cycle, that would put 0.11.0 at September 2nd. > > Based on the experiences with the stable releases, here's what I'd > recommend: > > o On July 15th, fork master -> stable-0.11 > o Change version to 0.10.90 > o Release qemu-0.11.0-rc1 > o Release additional -rcN releases every 1-2 weeks > o Introduce a new maintainer for stable-0.10 (via git pulls) > o At least 1 week before release, hopefully we'll have the final -rcN that > we can then declare 0.11.0. Sounds OK. I think OpenBIOS releases should follow similar schedule, maybe even with matching SVN tags (1.1-rc1 for 0.11.0-rc1 etc). > I think we should really try hard to make these dates. I only have a few > things that I would like to see happen before forking stable-0.11. Namely: > > o Setup qemu.org infrastructure (git hosting, wiki) > o Setup qemu bug tracker (see next mail) > o Include all ROM source code in tree via git submodules. This is a major > headache for distributors and I think it's important to resolve before our > next release. I think this is great, but OpenBIOS still uses Subversion. Can git use SVN submodules for example? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions
On Tue, Jun 23, 2009 at 07:35:00AM -0700, Davide Libenzi wrote: > On Tue, 23 Jun 2009, Gregory Haskins wrote: > > > Davide Libenzi wrote: > > > On Mon, 22 Jun 2009, Gregory Haskins wrote: > > > > > > > > >> To be honest, I am not sure. I would guess its not a *huge* deal, > > >> though it was obviously enough of a concern to at least discuss it. I > > >> can definitely say that I think the other issues which are being fixed > > >> are substantially more important. > > >> > > > > > > Ok then, will repost the revised patch later today. > > > > > > > Ok sounds good. I did have a chance to take a closer look at your > > proposal for the key data, and I think it makes a lot of sense. Do you > > see it as a problem if we defer adding that? Or should we try to add > > that notion now? > > That would need to go eventually via mainline, after some discussion. But > yes, I believe that using the "key" as simple bitmask is a little > restrictive. > > > - Davide I agree. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication
On Tue, Jun 23, 2009 at 01:55:52PM +0100, Paul Brook wrote: > > Here are two patches. One implements a virtio-serial device in qemu > > and the other is the driver for a guest kernel. > > So I'll ask again. Why is this separate from virtio-console? In the guest I wouldn't want virtio-serial devices to be mixed up with the virtio-console device. virtio-console has nice clear usecase of being an interactive console, and as such the guest OS can & should automatically start a mingetty/agetty process on any virtio-console device it finds. If we use virtio-console for data channels to, then guest config becomes much harder todo automatically. By all means share underlying code/infrastructure where appropriate, but they must ultimately appear as clearly separate devices IMHO Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: remove in_range from kvm_io_device
Remove in_range from kvm_io_device and ask read/write callbacks, if supplied, to perform range checks internally. This allows aliasing (mostly for in-kernel virtio), as well as better error handling by making it possible to pass errors up to userspace. And it's enough to look at the diffstat to see that it's a better API anyway. While we are at it, document locking rules for kvm_io_device. Signed-off-by: Michael S. Tsirkin --- This is a replacement for my patch extending in_range. This works for me, but please review carefully. arch/ia64/kvm/kvm-ia64.c | 28 - arch/x86/kvm/i8254.c | 49 -- arch/x86/kvm/i8259.c | 22 ++ arch/x86/kvm/lapic.c | 48 ++ arch/x86/kvm/x86.c| 100 +++-- include/linux/kvm_host.h |6 ++- virt/kvm/coalesced_mmio.c | 26 +--- virt/kvm/ioapic.c | 24 ++- virt/kvm/iodev.h | 34 ++- virt/kvm/kvm_main.c | 25 +++- 10 files changed, 158 insertions(+), 204 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index d20a5db..647ff91 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -198,16 +198,6 @@ int kvm_dev_ioctl_check_extension(long ext) } -static struct kvm_io_device *vcpu_find_mmio_dev(struct kvm_vcpu *vcpu, - gpa_t addr, int len, int is_write) -{ - struct kvm_io_device *dev; - - dev = kvm_io_bus_find_dev(&vcpu->kvm->mmio_bus, addr, len, is_write); - - return dev; -} - static int handle_vm_error(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) { kvm_run->exit_reason = KVM_EXIT_UNKNOWN; @@ -219,6 +209,7 @@ static int handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) { struct kvm_mmio_req *p; struct kvm_io_device *mmio_dev; + int r; p = kvm_get_vcpu_ioreq(vcpu); @@ -235,16 +226,13 @@ static int handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) kvm_run->exit_reason = KVM_EXIT_MMIO; return 0; mmio: - mmio_dev = vcpu_find_mmio_dev(vcpu, p->addr, p->size, !p->dir); - if (mmio_dev) { - if (!p->dir) - kvm_iodevice_write(mmio_dev, p->addr, p->size, - &p->data); - else - kvm_iodevice_read(mmio_dev, p->addr, p->size, - &p->data); - - } else + if (p->dir) + r = kvm_io_bus_read(&vcpu->kvm->mmio_bus, p->addr, p->size, + &p->data); + else + r = kvm_io_bus_write(&vcpu->kvm->mmio_bus, p->addr, p->size, +&p->data); + if (r) printk(KERN_ERR"kvm: No iodevice found! addr:%lx\n", p->addr); p->state = STATE_IORESP_READY; diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index c13bb92..d5881a4 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -336,8 +336,14 @@ void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val) mutex_unlock(&kvm->arch.vpit->pit_state.lock); } -static void pit_ioport_write(struct kvm_io_device *this, -gpa_t addr, int len, const void *data) +static inline int pit_in_range(gpa_t addr) +{ + return ((addr >= KVM_PIT_BASE_ADDRESS) && + (addr < KVM_PIT_BASE_ADDRESS + KVM_PIT_MEM_LENGTH)); +} + +static int pit_ioport_write(struct kvm_io_device *this, + gpa_t addr, int len, const void *data) { struct kvm_pit *pit = (struct kvm_pit *)this->private; struct kvm_kpit_state *pit_state = &pit->pit_state; @@ -345,6 +351,8 @@ static void pit_ioport_write(struct kvm_io_device *this, int channel, access; struct kvm_kpit_channel_state *s; u32 val = *(u32 *) data; + if (!pit_in_range(addr)) + return -EOPNOTSUPP; val &= 0xff; addr &= KVM_PIT_CHANNEL_MASK; @@ -407,16 +415,19 @@ static void pit_ioport_write(struct kvm_io_device *this, } mutex_unlock(&pit_state->lock); + return 0; } -static void pit_ioport_read(struct kvm_io_device *this, - gpa_t addr, int len, void *data) +static int pit_ioport_read(struct kvm_io_device *this, + gpa_t addr, int len, void *data) { struct kvm_pit *pit = (struct kvm_pit *)this->private; struct kvm_kpit_state *pit_state = &pit->pit_state; struct kvm *kvm = pit->kvm; int ret, count; struct kvm_kpit_channel_state *s; + if (!pit_in_range(addr)) + return -EOPNOTSUPP; addr &= KVM_PIT_CHANNEL_MASK; s = &pit_state->channels[addr]; @@ -471,37 +482,36 @@ static void pit_ioport_read(struct kvm_io_device *this, memcpy(data, (cha
Re: [KVM PATCH v3 3/3] KVM: Fix races in irqfd using new eventfd_kref_get interface
Gregory Haskins wrote: > Michael S. Tsirkin wrote: > > >> And note that multiple works can run on irqfd >> in parallel. >> >> > > They can? I thought work-queue items were guaranteed to only schedule > once? If what you say is true, its broken, I agree, and Ill need to > revisit. Let me get back to you. > Yep, you are right. This is broken in my code. :( Will address in the next spin. Good catch. -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions
Davide Libenzi wrote: > On Tue, 23 Jun 2009, Gregory Haskins wrote: > > >> Davide Libenzi wrote: >> >>> On Mon, 22 Jun 2009, Gregory Haskins wrote: >>> >>> >>> To be honest, I am not sure. I would guess its not a *huge* deal, though it was obviously enough of a concern to at least discuss it. I can definitely say that I think the other issues which are being fixed are substantially more important. >>> Ok then, will repost the revised patch later today. >>> >>> >> Ok sounds good. I did have a chance to take a closer look at your >> proposal for the key data, and I think it makes a lot of sense. Do you >> see it as a problem if we defer adding that? Or should we try to add >> that notion now? >> > > That would need to go eventually via mainline, after some discussion. But > yes, I believe that using the "key" as simple bitmask is a little > restrictive. > Ok. As long as you do not think we are somehow painting ourselves into a corner, lets just go with your original plan for the revised patch sans key changes. Thanks Davide, -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions
On Tue, 23 Jun 2009, Gregory Haskins wrote: > Davide Libenzi wrote: > > On Mon, 22 Jun 2009, Gregory Haskins wrote: > > > > > >> To be honest, I am not sure. I would guess its not a *huge* deal, > >> though it was obviously enough of a concern to at least discuss it. I > >> can definitely say that I think the other issues which are being fixed > >> are substantially more important. > >> > > > > Ok then, will repost the revised patch later today. > > > > Ok sounds good. I did have a chance to take a closer look at your > proposal for the key data, and I think it makes a lot of sense. Do you > see it as a problem if we defer adding that? Or should we try to add > that notion now? That would need to go eventually via mainline, after some discussion. But yes, I believe that using the "key" as simple bitmask is a little restrictive. - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication
Am Dienstag 23 Juni 2009 16:16:13 schrieb Paul Brook: > > I did some work on virtio-console, since kvm on s390 does not provide any > > other. I dont think we should mix two different types of devices into one > > driver. The only thing that these drivers have in common, is the fact > > that there are two virtqueues, piping data (single bytes or larger > > chunks). So you could make the same argument with the first virtio_net > > driver (the one before GSO) - which is obviously wrong. The common part > > of the transport is already factored out to virtio_ring and the > > transports. > > virtio-net is packet based, not stream based. You can argue that virtio-console is also packet based. The input buffer can accept up to 4K in one buffer and the console code can also submit larger chunks to virtio_console. > > In addition there are two ABIs involved: a userspace ABI (/dev/hvc0) and > > a guest/host ABI for this console. (and virtio was not meant to be a > > KVM-only interface, that we can change all the time). David A. Wheeler's > > 'SLOCCount' gives me 141 lines of code for virtio_console.c. I am quite > > confident that the saving we could achieve by merging these two drivers > > is not worth the hazzle. > > AFAICS the functionality provided is exactly the same. The host API is > identical, and the guest userspace API only has trivial differences (which > could be eliminated with a simple udev rule). By my reading virtio-serial > makes virtio-console entirely redundant. How can you know, that the userspace API only has trivial differences, if the question below is not answered? > > Discussion about merging the console code into this distracts from the > > main problem: To get the interface and functionality right before it > > becomes an ABI (is it /dev/ttyS, network like or is it something > > completely different?). > > Ah, now that's a different question. I don't know what the requirements are > for the higher level vmchannel interface. However I also don't care. You should care, because it might have an impact if two serial lines are really the right solution for the vmchannel. One thing that I forgot: You should be warned that hvc_console sometimes can be a real PITA. A while ago I tried to change virtio_console to support more than one console and hotplug and failed to find a proper solution that can handle all the subtle console/tty register/unregister combinations. You dont want to adopt new code to fit to hvc_console - leave it in virtio_console... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions
On Tue, 23 Jun 2009, Rusty Russell wrote: > The first 'struct eventfd_ctx;' line is not required. Will repost dropping that. - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions
Davide Libenzi wrote: > On Mon, 22 Jun 2009, Gregory Haskins wrote: > > >> To be honest, I am not sure. I would guess its not a *huge* deal, >> though it was obviously enough of a concern to at least discuss it. I >> can definitely say that I think the other issues which are being fixed >> are substantially more important. >> > > Ok then, will repost the revised patch later today. > Ok sounds good. I did have a chance to take a closer look at your proposal for the key data, and I think it makes a lot of sense. Do you see it as a problem if we defer adding that? Or should we try to add that notion now? -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions
On Mon, 22 Jun 2009, Gregory Haskins wrote: > To be honest, I am not sure. I would guess its not a *huge* deal, > though it was obviously enough of a concern to at least discuss it. I > can definitely say that I think the other issues which are being fixed > are substantially more important. Ok then, will repost the revised patch later today. - Davide -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 v2] allow hypervisor CPUID bit to be overriden
Andre Przywara wrote: KVM defaults to the hypervisor CPUID bit to be set, whereas pure QEMU clears it. On some occasions one wants to set or clear it the other way round (for instance to get HyperV running inside a guest). Move the bit-set to be done before the command line parsing and enable it by default. One can disable it by using: -cpu qemu64,-hypervisor Fix some whitespace damage on the way. Signed-off-by: Andre Przywara Please also resend 2/2. -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication
On Tuesday 23 June 2009, Christian Bornträger wrote: > Am Dienstag 23 Juni 2009 14:55:52 schrieb Paul Brook: > > > Here are two patches. One implements a virtio-serial device in qemu > > > and the other is the driver for a guest kernel. > > > > So I'll ask again. Why is this separate from virtio-console? > > I did some work on virtio-console, since kvm on s390 does not provide any > other. I dont think we should mix two different types of devices into one > driver. The only thing that these drivers have in common, is the fact that > there are two virtqueues, piping data (single bytes or larger chunks). So > you could make the same argument with the first virtio_net driver (the one > before GSO) - which is obviously wrong. The common part of the transport is > already factored out to virtio_ring and the transports. virtio-net is packet based, not stream based. > In addition there are two ABIs involved: a userspace ABI (/dev/hvc0) and a > guest/host ABI for this console. (and virtio was not meant to be a KVM-only > interface, that we can change all the time). David A. Wheeler's 'SLOCCount' > gives me 141 lines of code for virtio_console.c. I am quite confident that > the saving we could achieve by merging these two drivers is not worth the > hazzle. AFAICS the functionality provided is exactly the same. The host API is identical, and the guest userspace API only has trivial differences (which could be eliminated with a simple udev rule). By my reading virtio-serial makes virtio-console entirely redundant. > Discussion about merging the console code into this distracts from the main > problem: To get the interface and functionality right before it becomes an > ABI (is it /dev/ttyS, network like or is it something completely > different?). Ah, now that's a different question. I don't know what the requirements are for the higher level vmchannel interface. However I also don't care. Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication
Am Dienstag 23 Juni 2009 14:55:52 schrieb Paul Brook: > > Here are two patches. One implements a virtio-serial device in qemu > > and the other is the driver for a guest kernel. > > So I'll ask again. Why is this separate from virtio-console? I did some work on virtio-console, since kvm on s390 does not provide any other. I dont think we should mix two different types of devices into one driver. The only thing that these drivers have in common, is the fact that there are two virtqueues, piping data (single bytes or larger chunks). So you could make the same argument with the first virtio_net driver (the one before GSO) - which is obviously wrong. The common part of the transport is already factored out to virtio_ring and the transports. In addition there are two ABIs involved: a userspace ABI (/dev/hvc0) and a guest/host ABI for this console. (and virtio was not meant to be a KVM-only interface, that we can change all the time). David A. Wheeler's 'SLOCCount' gives me 141 lines of code for virtio_console.c. I am quite confident that the saving we could achieve by merging these two drivers is not worth the hazzle. Discussion about merging the console code into this distracts from the main problem: To get the interface and functionality right before it becomes an ABI (is it /dev/ttyS, network like or is it something completely different?). Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v8 3/3] KVM: add iosignalfd support
Michael S. Tsirkin wrote: > On Thu, Jun 18, 2009 at 08:30:46PM -0400, Gregory Haskins wrote: > >> +static int >> +iosignalfd_group_in_range(struct kvm_io_device *this, gpa_t addr, int len, >> + int is_write) >> +{ >> +struct _iosignalfd_group *p = to_group(this); >> + >> +return ((addr >= p->addr && (addr < p->addr + p->length))); >> +} >> > > I think I see a problem here. For virtio, we do not necessarily want all > virtqueues for a device to live in kernel: there might be control > virtqueues that we want to leave in userspace. Since this claims all > writes to a specific address, the signal never makes it to userspace. > > So based on this, I think you are right about the io_bus changes. If we accept your proposal this problem above is solved cleanly. Sorry for the resistance, but I just wanted to make sure we were doing the right thing. I am in agreement now. Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication
On Tuesday 23 June 2009, Amit Shah wrote: > On (Tue) Jun 23 2009 [13:55:52], Paul Brook wrote: > > > Here are two patches. One implements a virtio-serial device in qemu > > > and the other is the driver for a guest kernel. > > > > So I'll ask again. Why is this separate from virtio-console? > > I'm basically writing a vmchannel and found out that a lot can be shared > between some virtio devices. So I'm just trying to abstract out those > things in virtio-serial. Once we're sure virtio-serial is good and ready > to be merged, I will look at converting over virtio-console to the > virtio-serial interface. That doesn't really answer my question. We already have a virtual serial device (called virtio-console). Why are you inventing another one? Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication
On (Tue) Jun 23 2009 [13:55:52], Paul Brook wrote: > > Here are two patches. One implements a virtio-serial device in qemu > > and the other is the driver for a guest kernel. > > So I'll ask again. Why is this separate from virtio-console? I'm basically writing a vmchannel and found out that a lot can be shared between some virtio devices. So I'm just trying to abstract out those things in virtio-serial. Once we're sure virtio-serial is good and ready to be merged, I will look at converting over virtio-console to the virtio-serial interface. Is that an acceptable approach? Thanks, Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication
> Here are two patches. One implements a virtio-serial device in qemu > and the other is the driver for a guest kernel. So I'll ask again. Why is this separate from virtio-console? Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] virtio-serial: virtio device for simple host <-> guest communication
This interface presents a char device from which bits can be sent and read. Sample uses for such a device can be obtaining info from the guest like the file systems used, apps installed, etc. for offline usage and logged-in users, clipboard copy-paste, etc. for online usage. Signed-off-by: Amit Shah --- Makefile.target|2 +- hw/pc.c| 17 +++ hw/pci.h |1 + hw/qdev.c |6 +- hw/virtio-pci.c| 15 +++ hw/virtio-serial.c | 354 hw/virtio-serial.h | 39 ++ hw/virtio.h|1 + monitor.c |9 ++ qemu-monitor.hx|7 + qemu-options.hx|8 ++ sysemu.h | 11 ++ vl.c | 63 + 13 files changed, 531 insertions(+), 2 deletions(-) create mode 100644 hw/virtio-serial.c create mode 100644 hw/virtio-serial.h diff --git a/Makefile.target b/Makefile.target index 08121a9..7a0deb3 100644 --- a/Makefile.target +++ b/Makefile.target @@ -550,7 +550,7 @@ OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o machine.o \ gdbstub.o gdbstub-xml.o # virtio has to be here due to weird dependency between PCI and virtio-net. # need to fix this properly -OBJS+=virtio-blk.o virtio-balloon.o virtio-net.o virtio-console.o +OBJS+=virtio-blk.o virtio-balloon.o virtio-net.o virtio-console.o virtio-serial.o ifdef CONFIG_KVM OBJS+=kvm.o kvm-all.o endif diff --git a/hw/pc.c b/hw/pc.c index cb5b4d0..6ccf4b8 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -40,6 +40,8 @@ #include "qemu-kvm.h" +void *virtio_serial_new_port(PCIDevice *dev, int idx, char *name); + /* output Bochs bios info messages */ //#define DEBUG_BIOS @@ -1204,6 +1206,21 @@ static void pc_init1(ram_addr_t ram_size, } } +/* Add virtio serial devices */ +if (pci_enabled && virtio_serial_index) { +void *dev; + +dev = pci_create_simple(pci_bus, -1, "virtio-serial-pci"); +if (!dev) { +fprintf(stderr, "qemu: could not create virtio serial pci device\n"); +exit(1); +} + +for (i = 0; i < virtio_serial_index; i++) { +virtio_serial_new_port(dev, i, virtio_serial_names[i]); +} +} + #ifdef USE_KVM_DEVICE_ASSIGNMENT if (kvm_enabled()) { add_assigned_devices(pci_bus, assigned_devices, assigned_devices_index); diff --git a/hw/pci.h b/hw/pci.h index d835b6d..fe5f29e 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -72,6 +72,7 @@ extern target_phys_addr_t pci_mem_base; #define PCI_DEVICE_ID_VIRTIO_BLOCK 0x1001 #define PCI_DEVICE_ID_VIRTIO_BALLOON 0x1002 #define PCI_DEVICE_ID_VIRTIO_CONSOLE 0x1003 +#define PCI_DEVICE_ID_VIRTIO_SERIAL 0x1004 typedef void PCIConfigWriteFunc(PCIDevice *pci_dev, uint32_t address, uint32_t data, int len); diff --git a/hw/qdev.c b/hw/qdev.c index 385e709..fe5bcd0 100644 --- a/hw/qdev.c +++ b/hw/qdev.c @@ -168,9 +168,13 @@ CharDriverState *qdev_init_chardev(DeviceState *dev) { static int next_serial; static int next_virtconsole; +static int next_virtserial; + /* FIXME: This is a nasty hack that needs to go away. */ -if (strncmp(dev->type->info->name, "virtio", 6) == 0) { +if (strncmp(dev->type->info->name, "virtio-console", 14) == 0) { return virtcon_hds[next_virtconsole++]; +} else if (strncmp(dev->type->info->name, "virtio-serial", 13) == 0) { +return virtio_serial_hds[next_virtserial++]; } else { return serial_hds[next_serial++]; } diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 24fe837..199f0c6 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -334,6 +334,19 @@ static void virtio_balloon_init_pci(PCIDevice *pci_dev) 0x00); } +static void virtio_serial_init_pci(PCIDevice *pci_dev) +{ +VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev); +VirtIODevice *vdev; + +vdev = virtio_serial_init(&pci_dev->qdev); +virtio_init_pci(proxy, vdev, +PCI_VENDOR_ID_REDHAT_QUMRANET, +PCI_DEVICE_ID_VIRTIO_SERIAL, +PCI_CLASS_COMMUNICATION_OTHER, +0x00); +} + static void virtio_pci_register_devices(void) { pci_qdev_register("virtio-blk-pci", sizeof(VirtIOPCIProxy), @@ -344,6 +357,8 @@ static void virtio_pci_register_devices(void) virtio_console_init_pci); pci_qdev_register("virtio-balloon-pci", sizeof(VirtIOPCIProxy), virtio_balloon_init_pci); +pci_qdev_register("virtio-serial-pci", sizeof(VirtIOPCIProxy), + virtio_serial_init_pci); } device_init(virtio_pci_register_devices) diff --git a/hw/virtio-serial.c b/hw/virtio-serial.c new file mode 100644 index 000..5162316 --- /dev/null +++ b/hw/virtio-serial.c @@ -0,0 +1,354 @@ +/* + * Virtio serial interface + * + * This serial interface is a paravirtualised guest<->host + *
[PATCH] virtio_serial: A char device for simple guest <-> host communication
We expose multiple char devices ("ports") for simple communication between the host userspace and guest. This is mainly intended for obtaining information from the guest. Sample offline usages can be: poking around in a guest to find out the file systems used, applications installed, etc. Online usages can be sharing of clipboard data between the guest and the host, sending information about logged-in users to the host, locking the screen or session when a vnc session is closed, and so on. The interface presented to guest userspace is of a simple char device, so it can be used like this: fd = open("/dev/vmch0", O_RDWR); ret = read(fd, buf, 100); ret = write(fd, string, strlen(string)); ret = ioctl(fd, VIRTIO_SERIAL_IOCTL_GET_PORT_NAME, &port_name); Signed-off-by: Amit Shah --- drivers/char/Kconfig |6 + drivers/char/Makefile |1 + drivers/char/virtio_serial.c | 571 + include/linux/virtio_serial.h | 42 +++ 4 files changed, 620 insertions(+), 0 deletions(-) create mode 100644 drivers/char/virtio_serial.c create mode 100644 include/linux/virtio_serial.h diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig index 30bae6d..84a1718 100644 --- a/drivers/char/Kconfig +++ b/drivers/char/Kconfig @@ -679,6 +679,12 @@ config VIRTIO_CONSOLE help Virtio console for use with lguest and other hypervisors. +config VIRTIO_SERIAL + tristate "Virtio serial" + select VIRTIO + select VIRTIO_RING + help + Virtio serial device driver for simple guest and host communication config HVCS tristate "IBM Hypervisor Virtual Console Server support" diff --git a/drivers/char/Makefile b/drivers/char/Makefile index 189efcf..0b6c71e 100644 --- a/drivers/char/Makefile +++ b/drivers/char/Makefile @@ -54,6 +54,7 @@ obj-$(CONFIG_HVC_XEN) += hvc_xen.o obj-$(CONFIG_HVC_IUCV) += hvc_iucv.o obj-$(CONFIG_HVC_UDBG) += hvc_udbg.o obj-$(CONFIG_VIRTIO_CONSOLE) += virtio_console.o +obj-$(CONFIG_VIRTIO_SERIAL)+= virtio_serial.o obj-$(CONFIG_RAW_DRIVER) += raw.o obj-$(CONFIG_SGI_SNSC) += snsc.o snsc_event.o obj-$(CONFIG_MSPEC)+= mspec.o diff --git a/drivers/char/virtio_serial.c b/drivers/char/virtio_serial.c new file mode 100644 index 000..6c5ad5d --- /dev/null +++ b/drivers/char/virtio_serial.c @@ -0,0 +1,571 @@ +/* + * VirtIO Serial driver + * + * This is paravirtualised serial guest<->host communication channel + * for relaying short messages and events in either direction. + * + * One PCI device can have multiple serial ports so that different + * applications can communicate without polluting the PCI device space + * in the guest. + * + * Copyright (C) 2009, Red Hat, Inc. + * + * Author(s): Amit Shah + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +struct virtio_serial_struct { + struct virtio_device *vdev; + struct virtqueue *ctrl_vq; + struct virtqueue *in_vq, *out_vq; + struct virtio_serial_port *ports; + u32 nr_ports; + struct work_struct rx_work; + struct work_struct queue_work; +}; + +struct virtio_serial_port_buffer { + unsigned int len; + unsigned int offset; + char *buf; + + struct list_head list; +}; + +struct virtio_serial_id { + u32 id; /* Port number */ +}; + +struct virtio_serial_port { + struct virtio_serial_struct *virtserial; + + /* The name given to this channel, if any. NOT zero-terminated */ + char *name; + + /* Each port associates with a separate char device */ + dev_t dev; + struct cdev cdev; + + /* Buffer management */ + struct virtio_serial_port_buffer read_buf; + struct list_head readbuf_head; +}; + +static struct virtio_serial_struct virtserial; + +/* This should become per-port data */ +static DECLARE_COMPLETION(have_data); + +static int major = 60; /* from the experimental range */ + +static struct virtio_serial_port *get_port_from_id(u32 id) +{ + if (id > virtserial.nr_ports) + return NULL; + + return &virtserial.ports[id]; +} + +static int get_id_from_port(struct virt
virtio-serial: A guest <-> host interface for simple communication
Hello, Here are two patches. One implements a virtio-serial device in qemu and the other is the driver for a guest kernel. While working on a vmchannel interface that is needed for communication between guest userspace and host userspace, I saw that most of the interface can be abstracted out as a "serial" device with "ports". Some requirements for a vmchannel are listed at this page: http://kvm.et.redhat.com/page/VMchannel_Requirements A few sample uses for a vmchannel are to share the host and guest clipboards (to allow copy/paste between a host and a guest), to lock the screen of the guest session when the vnc viewer is closed, to find out which applications are installed on a guest OS even when the guest is powered down (using virt-inspector) and so on. At this time, the qemu device is more complete than the guest driver is. However, I'd like you to review both the patches for inclusion. Thanks, Amit. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2 v2] allow hypervisor CPUID bit to be overriden
KVM defaults to the hypervisor CPUID bit to be set, whereas pure QEMU clears it. On some occasions one wants to set or clear it the other way round (for instance to get HyperV running inside a guest). Move the bit-set to be done before the command line parsing and enable it by default. One can disable it by using: -cpu qemu64,-hypervisor Fix some whitespace damage on the way. Signed-off-by: Andre Przywara --- target-i386/helper.c | 23 +++ 1 files changed, 11 insertions(+), 12 deletions(-) diff --git a/target-i386/helper.c b/target-i386/helper.c index 8a76abd..cb9113e 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -44,7 +44,7 @@ static const char *ext_feature_name[] = { "pni" /* Intel,AMD sse3 */, NULL, NULL, "monitor", "ds_cpl", "vmx", NULL /* Linux smx */, "est", "tm2", "ssse3", "cid", NULL, NULL, "cx16", "xtpr", NULL, NULL, NULL, "dca", NULL, NULL, NULL, NULL, "popcnt", - NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, +NULL, NULL, NULL, NULL, NULL, NULL, NULL, "hypervisor", }; static const char *ext2_feature_name[] = { "fpu", "vme", "de", "pse", "tsc", "msr", "pae", "mce", @@ -59,30 +59,30 @@ static const char *ext3_feature_name[] = { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, }; -static void add_flagname_to_bitmaps(char *flagname, uint32_t *features, -uint32_t *ext_features, -uint32_t *ext2_features, +static void add_flagname_to_bitmaps(const char *flagname, uint32_t *features, +uint32_t *ext_features, +uint32_t *ext2_features, uint32_t *ext3_features) { int i; int found = 0; -for ( i = 0 ; i < 32 ; i++ ) +for ( i = 0 ; i < 32 ; i++ ) if (feature_name[i] && !strcmp (flagname, feature_name[i])) { *features |= 1 << i; found = 1; } -for ( i = 0 ; i < 32 ; i++ ) +for ( i = 0 ; i < 32 ; i++ ) if (ext_feature_name[i] && !strcmp (flagname, ext_feature_name[i])) { *ext_features |= 1 << i; found = 1; } -for ( i = 0 ; i < 32 ; i++ ) +for ( i = 0 ; i < 32 ; i++ ) if (ext2_feature_name[i] && !strcmp (flagname, ext2_feature_name[i])) { *ext2_features |= 1 << i; found = 1; } -for ( i = 0 ; i < 32 ; i++ ) +for ( i = 0 ; i < 32 ; i++ ) if (ext3_feature_name[i] && !strcmp (flagname, ext3_feature_name[i])) { *ext3_features |= 1 << i; found = 1; @@ -330,6 +330,9 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, const char *cpu_model) goto error; memcpy(x86_cpu_def, def, sizeof(*def)); +add_flagname_to_bitmaps("hypervisor", &plus_features, +&plus_ext_features, &plus_ext2_features, &plus_ext3_features); + featurestr = strtok(NULL, ","); while (featurestr) { @@ -1523,10 +1526,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, *ebx = (env->cpuid_apic_id << 24) | 8 << 8; /* CLFLUSH size in quad words, Linux wants it. */ *ecx = env->cpuid_ext_features; *edx = env->cpuid_features; - -/* "Hypervisor present" bit required for Microsoft SVVP */ -if (kvm_enabled()) -*ecx |= (1 << 31); break; case 2: /* cache info: needed for Pentium Pro compatibility */ -- 1.6.1.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] pass write value to in_range pointers
On 06/23/2009 03:01 PM, Gregory Haskins wrote: Ok, so for now I will just crank up the io_bus array, and we can address scale another day. Can I just drop patch 2/3 and let the io_bus govern the limit? So long as we have a runtime-discoverable limit, yes. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Planning for the 0.11.0 release
On 06/23/2009 02:57 AM, Anthony Liguori wrote: Hi, It's getting to be about the time to start thinking about the 0.11.0 release. 0.10.0 was released on March 2nd so following with the 6 month release cycle, that would put 0.11.0 at September 2nd. Based on the experiences with the stable releases, here's what I'd recommend: Seems sane to me. o Release qemu-0.11.0-rc1 o Release additional -rcN releases every 1-2 weeks I'll release qemu-kvm-0.11.0-rc* in parallel so we can get feedback from kvm users as well. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] pass write value to in_range pointers
Avi Kivity wrote: > On 06/23/2009 02:44 PM, Michael S. Tsirkin wrote: >> On Tue, Jun 23, 2009 at 12:04:06AM -0400, Gregory Haskins wrote: >> > It will also need to support > multiple matches. > > What, signal many fds on the same address/value pair? I see this as a bug. Why is this a good thing to support? Just increases the chance of leaking this fd. >>> I believe Avi asked for this feature specifically, so I will defer >>> to him. >>> >> >> Hmm. That's hard to implement in my model. Avi, can we give up >> this feature? I don't think anyone needs this specifically ... >> > > I think we can make do with passing that single eventfd to multiple > consumers. It means their event count reads may return zero, but I > guess we can live with that. > > I do want to retain flexibility in how we route events. > Ok, so for now I will just crank up the io_bus array, and we can address scale another day. Can I just drop patch 2/3 and let the io_bus govern the limit? -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH RFC] pass write value to in_range pointers
On 06/23/2009 02:44 PM, Michael S. Tsirkin wrote: On Tue, Jun 23, 2009 at 12:04:06AM -0400, Gregory Haskins wrote: It will also need to support multiple matches. What, signal many fds on the same address/value pair? I see this as a bug. Why is this a good thing to support? Just increases the chance of leaking this fd. I believe Avi asked for this feature specifically, so I will defer to him. Hmm. That's hard to implement in my model. Avi, can we give up this feature? I don't think anyone needs this specifically ... I think we can make do with passing that single eventfd to multiple consumers. It means their event count reads may return zero, but I guess we can live with that. I do want to retain flexibility in how we route events. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] pass write value to in_range pointers
Michael S. Tsirkin wrote: > On Tue, Jun 23, 2009 at 12:04:06AM -0400, Gregory Haskins wrote: > It will also need to support multiple matches. >>> What, signal many fds on the same address/value pair? >>> I see this as a bug. Why is this a good thing to support? >>> Just increases the chance of leaking this fd. >>> >>> >> I believe Avi asked for this feature specifically, so I will defer to him. >> > > Hmm. That's hard to implement in my model. Avi, can we give up > this feature? I don't think anyone needs this specifically ... > > If we can drop this it will simplify the mods I will need to do to io_bus to support Michael's new interface. Out of curiosity, what is it you are trying to build Michael? -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH RFC] pass write value to in_range pointers
On 06/23/2009 07:04 AM, Gregory Haskins wrote: Well, for one its not very clear what the benefit of the read/write aliasing even is. ;) Apparently coalesced_mmio uses it, but even so I doubt that is for the purposes of having one device do reads while another does writes. I could be wrong, though. Coalesced mmio cannot handle reads. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] pass write value to in_range pointers
On Tue, Jun 23, 2009 at 07:41:12AM -0400, Gregory Haskins wrote: > Avi Kivity wrote: > > On 06/22/2009 07:08 PM, Michael S. Tsirkin wrote: > >> On Mon, Jun 22, 2009 at 11:45:00AM -0400, Gregory Haskins wrote: > >> > >>> Michael S. Tsirkin wrote: > >>> > It seems that a lot of complexity and trickiness with iosignalfd is > handling the group/item relationship, which comes about because kvm > does > not currently let a device on the bus claim a write transaction > based on the > value written. This could be greatly simplified if the value written > was passed to the in_range check for write operation. We could then > simply make each kvm_iosignalfd a device on the bus. > > What does everyone think of the following lightly tested patch? > > > >>> Hi Michael, > >>>Its interesting, but I am not convinced its necessary. We > >>> created the > >>> group/item layout because iosignalfds are unique in that they are > >>> probably the only IO device that wants to do some kind of address > >>> aliasing. > >>> > >> > >> We actually already have aliasing: is_write flag is used for this > >> purpose. Actually, it's possible to remove is_write by passing > >> a null pointer in write_val for reads. I like this a bit less as > >> the code generated is less compact ... Avi, what do you think? > >> > > > > Greg, won't Michael's patch eliminate a big chunk from your iosignalfd > > patches? Seems like a win to me. > > Well, it really just moves that hunk from eventfd.c to kvm_main.c, where > I don't think anyone else will use it by iosignalfd. But if that is > what everyone wants, I guess I have no choice. Wait a bit before you start rebasing though please. I just had a brainwave and is rewriting this patch. > > > >> One is enough :) > >> Seriously, do you see that this saves you all of RCU, linked lists and > >> counters? You don't need to keep track of iofds, you don't need to > >> implement your own lookup logic - you just use the kvm device > >> and that's it. > >> > >> > > > > Yup. > > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] pass write value to in_range pointers
On Tue, Jun 23, 2009 at 12:04:06AM -0400, Gregory Haskins wrote: > >> It will also need to support > >> multiple matches. > >> > > > > What, signal many fds on the same address/value pair? > > I see this as a bug. Why is this a good thing to support? > > Just increases the chance of leaking this fd. > > > > I believe Avi asked for this feature specifically, so I will defer to him. Hmm. That's hard to implement in my model. Avi, can we give up this feature? I don't think anyone needs this specifically ... -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 1/2] allow hypervisor CPUID bit to be overriden
> I agree it's pointless, but it is a Microsoft requirement for passing > their SVVP tests. Enabling it by default makes life a little easier for > users who wish to validate their hypervisor and has no drawbacks. I wasn't arguing against setting it by default (for QEMU CPU types), just against enabling it unconditionally. Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] pass write value to in_range pointers
Avi Kivity wrote: > On 06/22/2009 07:08 PM, Michael S. Tsirkin wrote: >> On Mon, Jun 22, 2009 at 11:45:00AM -0400, Gregory Haskins wrote: >> >>> Michael S. Tsirkin wrote: >>> It seems that a lot of complexity and trickiness with iosignalfd is handling the group/item relationship, which comes about because kvm does not currently let a device on the bus claim a write transaction based on the value written. This could be greatly simplified if the value written was passed to the in_range check for write operation. We could then simply make each kvm_iosignalfd a device on the bus. What does everyone think of the following lightly tested patch? >>> Hi Michael, >>>Its interesting, but I am not convinced its necessary. We >>> created the >>> group/item layout because iosignalfds are unique in that they are >>> probably the only IO device that wants to do some kind of address >>> aliasing. >>> >> >> We actually already have aliasing: is_write flag is used for this >> purpose. Actually, it's possible to remove is_write by passing >> a null pointer in write_val for reads. I like this a bit less as >> the code generated is less compact ... Avi, what do you think? >> > > Greg, won't Michael's patch eliminate a big chunk from your iosignalfd > patches? Seems like a win to me. Well, it really just moves that hunk from eventfd.c to kvm_main.c, where I don't think anyone else will use it by iosignalfd. But if that is what everyone wants, I guess I have no choice. > >> One is enough :) >> Seriously, do you see that this saves you all of RCU, linked lists and >> counters? You don't need to keep track of iofds, you don't need to >> implement your own lookup logic - you just use the kvm device >> and that's it. >> >> > > Yup. > signature.asc Description: OpenPGP digital signature
Re: [KVM PATCH v8 3/3] KVM: add iosignalfd support
Michael S. Tsirkin wrote: > On Tue, Jun 23, 2009 at 07:33:07AM -0400, Gregory Haskins wrote: > >> Michael S. Tsirkin wrote: >> >>> On Thu, Jun 18, 2009 at 08:30:46PM -0400, Gregory Haskins wrote: >>> >>> +static int +iosignalfd_group_in_range(struct kvm_io_device *this, gpa_t addr, int len, +int is_write) +{ + struct _iosignalfd_group *p = to_group(this); + + return ((addr >= p->addr && (addr < p->addr + p->length))); +} >>> I think I see a problem here. For virtio, we do not necessarily want all >>> virtqueues for a device to live in kernel: there might be control >>> virtqueues that we want to leave in userspace. Since this claims all >>> writes to a specific address, the signal never makes it to userspace. >>> >>> >>> >> You can use a wildcard. Would that work? >> >> > > Nope, you need to know the value written. > > Note: You can also terminate an eventfd in userspace...it doesn't have to terminate in-kernel. Not sure if that helps. signature.asc Description: OpenPGP digital signature
Re: [KVM PATCH v8 3/3] KVM: add iosignalfd support
On Tue, Jun 23, 2009 at 07:33:07AM -0400, Gregory Haskins wrote: > Michael S. Tsirkin wrote: > > On Thu, Jun 18, 2009 at 08:30:46PM -0400, Gregory Haskins wrote: > > > >> +static int > >> +iosignalfd_group_in_range(struct kvm_io_device *this, gpa_t addr, int len, > >> +int is_write) > >> +{ > >> + struct _iosignalfd_group *p = to_group(this); > >> + > >> + return ((addr >= p->addr && (addr < p->addr + p->length))); > >> +} > >> > > > > I think I see a problem here. For virtio, we do not necessarily want all > > virtqueues for a device to live in kernel: there might be control > > virtqueues that we want to leave in userspace. Since this claims all > > writes to a specific address, the signal never makes it to userspace. > > > > > You can use a wildcard. Would that work? > Nope, you need to know the value written. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 1/2] allow hypervisor CPUID bit to be overriden
On 06/23/2009 02:31 PM, Paul Brook wrote: On Tuesday 23 June 2009, Avi Kivity wrote: On 06/23/2009 12:47 AM, Andre Przywara wrote: KVM defaults to the hypervisor CPUID bit to be set, whereas pure QEMU clears it. On some occasions one want to set or clear it the other way round (for instance to get HyperV running inside a guest). Allow the default to be overridden on the command line and fix some whitespace damage on the way. It makes sense for qemu to set the hypervisor bit unconditionally. A guest running under qemu is not bare metal. I see no reason why a guest has to be told that it's running inside a VM. In principle an appropriately configured qemu should be indistinguishable from real hardware. In practice it's technically infeasible to cover absolutely everything, but if we set this bit we're not even trying. I have no objection to the bit being set by default for the QEMU CPU types. I agree it's pointless, but it is a Microsoft requirement for passing their SVVP tests. Enabling it by default makes life a little easier for users who wish to validate their hypervisor and has no drawbacks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v8 3/3] KVM: add iosignalfd support
Michael S. Tsirkin wrote: > On Thu, Jun 18, 2009 at 08:30:46PM -0400, Gregory Haskins wrote: > >> +static int >> +iosignalfd_group_in_range(struct kvm_io_device *this, gpa_t addr, int len, >> + int is_write) >> +{ >> +struct _iosignalfd_group *p = to_group(this); >> + >> +return ((addr >= p->addr && (addr < p->addr + p->length))); >> +} >> > > I think I see a problem here. For virtio, we do not necessarily want all > virtqueues for a device to live in kernel: there might be control > virtqueues that we want to leave in userspace. Since this claims all > writes to a specific address, the signal never makes it to userspace. > > You can use a wildcard. Would that work? signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] Re: [PATCH 1/2] allow hypervisor CPUID bit to be overriden
On Tuesday 23 June 2009, Avi Kivity wrote: > On 06/23/2009 12:47 AM, Andre Przywara wrote: > > KVM defaults to the hypervisor CPUID bit to be set, whereas pure QEMU > > clears it. On some occasions one want to set or clear it the other way > > round (for instance to get HyperV running inside a guest). > > Allow the default to be overridden on the command line and fix some > > whitespace damage on the way. > > It makes sense for qemu to set the hypervisor bit unconditionally. A > guest running under qemu is not bare metal. I see no reason why a guest has to be told that it's running inside a VM. In principle an appropriately configured qemu should be indistinguishable from real hardware. In practice it's technically infeasible to cover absolutely everything, but if we set this bit we're not even trying. I have no objection to the bit being set by default for the QEMU CPU types. Paul -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v8 3/3] KVM: add iosignalfd support
On 06/23/2009 01:48 PM, Michael S. Tsirkin wrote: On Tue, Jun 23, 2009 at 12:57:53PM +0300, Avi Kivity wrote: On 06/23/2009 11:56 AM, Michael S. Tsirkin wrote: On Thu, Jun 18, 2009 at 08:30:46PM -0400, Gregory Haskins wrote: +static int +iosignalfd_group_in_range(struct kvm_io_device *this, gpa_t addr, int len, + int is_write) +{ + struct _iosignalfd_group *p = to_group(this); + + return ((addr>= p->addr&& (addr< p->addr + p->length))); +} I think I see a problem here. For virtio, we do not necessarily want all virtqueues for a device to live in kernel: there might be control virtqueues that we want to leave in userspace. Since this claims all writes to a specific address, the signal never makes it to userspace. Userspace could create an eventfd for this control queue and wait for it to fire. What if guest writes an unexpected value there? The value it simply lost ... that's not very elegant. True, it's better to have a lossless interface. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2810749 ] linux guest framebuffer fails under vnc
Bugs item #2810749, was opened at 2009-06-23 09:23 Message generated for change (Comment added) made by elfe You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2810749&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Elfe (elfe) Assigned to: Nobody/Anonymous (nobody) Summary: linux guest framebuffer fails under vnc Initial Comment: When running a linux with framebuffer enabled the vnc server becomes unusable and causes the viewer to exit. (reconnecting to the vnc server does not work as well) The machines seems to be still up as it can respond to pings. booting the gentoo cd with the gentoo-nofb kernel works well I don't know if it's related but switching a windows guest to a higher resolution than 1024x768 caues the vnc client to exit as well. (it can be accessed again after waiting the 15 seconds for the display mode confirmation timeout) cpu: 1x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz kvm version: 0.85 (0.86 is not on gentoo atm) host kernel: 2.6.29-gentoo-r5 host os/architecture: gentoo x86_64 guest os: gentoo minimal install disc for x86_64, OpenSUSE 11.1 i586 commandline: kvm -drive file=/install-amd64-minimal-2008.0-r1.iso,if=ide,media=cdrom -m 768 -vnc :35 -smp 1 -usbdevice tablet -serial none -parallel none not helping: -no-kvm-irqchip -no-kvm-pit -no-kvm the vncviewer returns errors like: vncviewer PreferredEncoding=hextile x.x.x.x:35 VNC Viewer Free Edition 4.1.3 for X - built Apr 17 2009 14:25:08 Copyright (C) 2002-2008 RealVNC Ltd. See http://www.realvnc.com for information on VNC. Tue Jun 23 09:01:06 2009 CConn: connected to host x.x.x.x port 5935 CConnection: Server supports RFB protocol version 3.8 CConnection: Using RFB protocol version 3.8 TXImage: Using default colormap and visual, TrueColor, depth 24. CConn: Using pixel format depth 6 (8bpp) rgb222 CConn: Using hextile encoding Rect too big: 1x0 at 1,1024 exceeds 1024x768 Tue Jun 23 09:01:08 2009 main:Rect too big -- >Comment By: Elfe (elfe) Date: 2009-06-23 13:15 Message: I was able to work around the issue on OpenSUSE by changing the VESA 800x600 at 16bit to 800x600 with 24 bit color depth by using SDL for access. But the general issue with vnc is still there and should be addressed. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2810749&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2][RFC] Kernel changes for HPET legacy mode (v7)
Beth Kon wrote: > Avi Kivity wrote: >> On 06/22/2009 12:14 PM, Jan Kiszka wrote: > Hmm, stead of introducing a new pair of singe-purpose IOCTLs, why not > add KVM_GET/SET_PIT2 which exchanges an extended kvm_pit_state2. And > that struct should also include some flags field and enough padding to > be potentially extended yet again in the future. In that case I see no > problem having also a mode read-back interface. > > We'd only add kernel hpet if we were forced to (I imagine the same applications/kernels that forced the PIT into the kernel will do the same for HPET). >>> >>> Answer and citation does not yet correlate for me. >>> >> >> Misquote. I meant to reply to your 'Is it planned to add in-kernel >> hpet support?' question. Must be early in the morning in some timezone. >> >>> Could you comment more explicitly if your are fine with Beth's proposed >>> interface, rather prefer something like my suggestion or even want >>> something totally different? >>> >> >> GET/SET PIT2 looks like the best choice to me, at least until I find >> whoever designed the HPET/PIT interdependency and make him take it back. >> > It seems to me that GET/SET PIT2 adds a good deal of complexity without > any gain. PIT is not a very dynamic technology. GET/SET PIT works for > PIT operational needs as defined by the PIT specifications. This whole > problem existe because of the unfortunate requirement that hpet disable > PIT interrupts, which is quite outside normal PIT operation. If I create > a GET/SET PIT2, and a PITState2 that is a superset of PITState1, then I > need to address all the cases where PITState is currently set/referenced > and properly use PITState2/PITState, depending on whether the kernel > supports PITState2. It just seems unnecessary, since hpet_legacy, and > probably any other future "control" of the PIT will likely be outside of > "normal" PIT operation. I really think a separate ioctl that transfers > just control information (of which one of the flag bits would be > hpet_legacy) would be preferable and cleaner. Am I missing some > advantage to PITState2? Or is there some simple way to implement this > that I'm missing? I think you just need to refactor out the processing of kvm_pit_channel_state and call it from the legacy get/set handler as well as from the new one. And the new handlers will also process the additional fields that will come with kvm_pit_state2. Not really more complex than the current proposal. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v8 3/3] KVM: add iosignalfd support
On Tue, Jun 23, 2009 at 12:57:53PM +0300, Avi Kivity wrote: > On 06/23/2009 11:56 AM, Michael S. Tsirkin wrote: >> On Thu, Jun 18, 2009 at 08:30:46PM -0400, Gregory Haskins wrote: >> >>> +static int >>> +iosignalfd_group_in_range(struct kvm_io_device *this, gpa_t addr, int len, >>> + int is_write) >>> +{ >>> + struct _iosignalfd_group *p = to_group(this); >>> + >>> + return ((addr>= p->addr&& (addr< p->addr + p->length))); >>> +} >>> >> >> I think I see a problem here. For virtio, we do not necessarily want all >> virtqueues for a device to live in kernel: there might be control >> virtqueues that we want to leave in userspace. Since this claims all >> writes to a specific address, the signal never makes it to userspace. >> > > Userspace could create an eventfd for this control queue and wait for it > to fire. What if guest writes an unexpected value there? The value it simply lost ... that's not very elegant. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] introduce -cpu host target
On 06/23/2009 12:47 AM, Andre Przywara wrote: Although the guest's CPUID bits can be controlled in a fine grained way in QEMU, a simple way to inject the host CPU is missing. This is handy for KVM desktop virtualization, where one wants the guest to support the full host feature set. Introduce another CPU type called 'host', which will propagate the host's CPUID bits to the guest. Problematic bits can still be turned off by using the existing syntax (-cpu host,-skinit) kvm already knows how to filter unknown bits to prevent runtime failures. This is even more important for qemu/tcg, since even simple bits which only define new instructions need explicit support in qemu. I think -cpu host should default to filtering unsupported bits instead of hoping the guest will ignore them or expecting the user to know which bits to remove. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] allow hypervisor CPUID bit to be overriden
On 06/23/2009 12:47 AM, Andre Przywara wrote: KVM defaults to the hypervisor CPUID bit to be set, whereas pure QEMU clears it. On some occasions one want to set or clear it the other way round (for instance to get HyperV running inside a guest). Allow the default to be overridden on the command line and fix some whitespace damage on the way. It makes sense for qemu to set the hypervisor bit unconditionally. A guest running under qemu is not bare metal. That's a separate change though. Patch looks good. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v8 3/3] KVM: add iosignalfd support
On 06/23/2009 11:56 AM, Michael S. Tsirkin wrote: On Thu, Jun 18, 2009 at 08:30:46PM -0400, Gregory Haskins wrote: +static int +iosignalfd_group_in_range(struct kvm_io_device *this, gpa_t addr, int len, + int is_write) +{ + struct _iosignalfd_group *p = to_group(this); + + return ((addr>= p->addr&& (addr< p->addr + p->length))); +} I think I see a problem here. For virtio, we do not necessarily want all virtqueues for a device to live in kernel: there might be control virtqueues that we want to leave in userspace. Since this claims all writes to a specific address, the signal never makes it to userspace. Userspace could create an eventfd for this control queue and wait for it to fire. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] pass write value to in_range pointers
On 06/22/2009 07:29 PM, Gregory Haskins wrote: We actually already have aliasing: is_write flag is used for this purpose. Yes, but read/write address aliasing is not the same thing is multi-match data aliasing. Besides, your proposal also breaks some of the natural relationship models (e.g. all the aliased iosignal_items always belong to the same underlying device. io_bus entries have an arbitrary topology). It's all one big hack, we want to get the correct function called with as little fuss as possible. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] pass write value to in_range pointers
On 06/22/2009 07:08 PM, Michael S. Tsirkin wrote: On Mon, Jun 22, 2009 at 11:45:00AM -0400, Gregory Haskins wrote: Michael S. Tsirkin wrote: It seems that a lot of complexity and trickiness with iosignalfd is handling the group/item relationship, which comes about because kvm does not currently let a device on the bus claim a write transaction based on the value written. This could be greatly simplified if the value written was passed to the in_range check for write operation. We could then simply make each kvm_iosignalfd a device on the bus. What does everyone think of the following lightly tested patch? Hi Michael, Its interesting, but I am not convinced its necessary. We created the group/item layout because iosignalfds are unique in that they are probably the only IO device that wants to do some kind of address aliasing. We actually already have aliasing: is_write flag is used for this purpose. Actually, it's possible to remove is_write by passing a null pointer in write_val for reads. I like this a bit less as the code generated is less compact ... Avi, what do you think? Greg, won't Michael's patch eliminate a big chunk from your iosignalfd patches? Seems like a win to me. One is enough :) Seriously, do you see that this saves you all of RCU, linked lists and counters? You don't need to keep track of iofds, you don't need to implement your own lookup logic - you just use the kvm device and that's it. Yup. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v8 3/3] KVM: add iosignalfd support
On Thu, Jun 18, 2009 at 08:30:46PM -0400, Gregory Haskins wrote: > +static int > +iosignalfd_group_in_range(struct kvm_io_device *this, gpa_t addr, int len, > + int is_write) > +{ > + struct _iosignalfd_group *p = to_group(this); > + > + return ((addr >= p->addr && (addr < p->addr + p->length))); > +} I think I see a problem here. For virtio, we do not necessarily want all virtqueues for a device to live in kernel: there might be control virtqueues that we want to leave in userspace. Since this claims all writes to a specific address, the signal never makes it to userspace. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html