date:20090623

[ kvm-Bugs-2810749 ] linux guest framebuffer fails under vnc

2009-06-23 Thread SourceForge.net

Bugs item #2810749, was opened at 2009-06-23 09:23
Message generated for change (Comment added) made by elfe
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2810749&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Elfe (elfe)
Assigned to: Nobody/Anonymous (nobody)
Summary: linux guest framebuffer fails under vnc

Initial Comment:
When running a linux with framebuffer enabled the vnc server becomes unusable 
and causes the viewer to exit. (reconnecting to the vnc server does not work as 
well)
The machines seems to be still up as it can respond to pings.
booting the gentoo cd with the gentoo-nofb kernel works well
I don't know if it's related but switching a windows guest to a higher 
resolution than 1024x768 caues the vnc client to exit as well. (it can be 
accessed again after waiting the 15 seconds for the display mode confirmation 
timeout)

cpu: 1x Intel(R) Xeon(R) CPU X3320  @ 2.50GHz
kvm version: 0.85 (0.86 is not on gentoo atm)
host kernel: 2.6.29-gentoo-r5
host os/architecture: gentoo x86_64
guest os: gentoo minimal install disc for x86_64, OpenSUSE 11.1 i586
commandline: kvm -drive 
file=/install-amd64-minimal-2008.0-r1.iso,if=ide,media=cdrom -m 768 -vnc :35 
-smp 1 -usbdevice tablet -serial none -parallel none
not helping: -no-kvm-irqchip -no-kvm-pit  -no-kvm

the vncviewer returns errors like:
vncviewer PreferredEncoding=hextile x.x.x.x:35

VNC Viewer Free Edition 4.1.3 for X - built Apr 17 2009 14:25:08
Copyright (C) 2002-2008 RealVNC Ltd.
See http://www.realvnc.com for information on VNC.

Tue Jun 23 09:01:06 2009
 CConn:   connected to host x.x.x.x port 5935
 CConnection: Server supports RFB protocol version 3.8
 CConnection: Using RFB protocol version 3.8
 TXImage: Using default colormap and visual, TrueColor, depth 24.
 CConn:   Using pixel format depth 6 (8bpp) rgb222
 CConn:   Using hextile encoding
Rect too big: 1x0 at 1,1024 exceeds 1024x768

Tue Jun 23 09:01:08 2009
 main:Rect too big


--

>Comment By: Elfe (elfe)
Date: 2009-06-24 08:14

Message:
seems to be working in qemu-kvm-0.10.5

--

Comment By: Elfe (elfe)
Date: 2009-06-23 13:15

Message:
I was able to work around the issue on OpenSUSE by changing the VESA
800x600 at 16bit to 800x600 with 24 bit color depth by using SDL for
access.
But the general issue with vnc is still there and should be addressed.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2810749&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: virtio-serial: A guest <-> host interface for simple communication

2009-06-23 Thread Rusty Russell

On Tue, 23 Jun 2009 10:12:31 pm Amit Shah wrote:
> Hello,
>
> Here are two patches. One implements a virtio-serial device in qemu
> and the other is the driver for a guest kernel.
>
> While working on a vmchannel interface that is needed for communication
> between guest userspace and host userspace, I saw that most of the
> interface can be abstracted out as a "serial" device with "ports".

OK, I don't think the "naming" idea works though.  A userspace user would have 
to open each one in turn to get its name.  I'd stick with numbers.

You also don't have dynamic creation and removal, except by hotpluging the 
entire device (which was on your requirements page).

I'd put a size and bitmap in the configuration space, and use that to indicate 
what ports exist.  Register on the change interrupt to get updates.  Drop the 
control vq entirely.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH/RFC] virtio_test: A module for testing virtio via userspace

2009-06-23 Thread Rusty Russell

On Fri, 19 Jun 2009 04:57:34 pm Christian Bornträger wrote:
> Hello Rusty,
>
> this is a result of a two month internship about virtio testing.

Interesting!

> I would like to get feedback on
>
> o  the general idea of a virtio_test module
> o  the user interface ioctls
> o  further ideas and comments

Not mugging real drivers would be a requirement, I think.

> +config VIRTIO_TEST
> + tristate "Virtio test driver (EXPERIMENTAL)"
> + select VIRTIO
> + select VIRTIO_RING

Perhaps these should be depends?  Plus, depends on EXPERIMENTAL.

> +  If unsure, say M.

That's "N" I think.

> + case VIOTEST_IOCGETBUF:
> + ret = do_get_buf(vtest, (struct viotest_getbuf __user *) arg);
> + break;
> + case VIOTEST_IOCGETCBS:
> + ret = get_callbacks(vtest, (struct viotest_cbinfo __user *) 
> arg);
> + break;

Generally the point of callbacks is to tell you you have new buffers; in fact 
you're insulated from callbacks which don't show new buffers.  So I'm not sure 
these two need to be separate?

In which case, a read/write interface starts to make sense (write for addbuf 
and kick, read for get_buf).  That fits nicely with O_NONBLOCK and poll().

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions

2009-06-23 Thread Rusty Russell

On Tue, 23 Jun 2009 03:33:22 am Davide Libenzi wrote:
> What you're doing there, is setting up a kernel-to-kernel (since
> userspace only role is to create the eventfd) communication, using a file*
> as accessory. That IMO is plain wrong.

The most sensible is that userspace can use these fds; an in-kernel variant is 
possible too, but not primary IMHO.

It's nice that userspace create the fds; it can then use the same fd for 
multiple event sources.

But I didn't see anything wrong with the way eventfd used to work: you have a 
kvm ioctl to say "attach this eventfd to this guest notification" and that does 
the eventfd_fget.  A detach ioctl does the fput (as does release of the kvm 
fd).

If they close the eventfd and don't do the detach ioctl, it's their problem.

Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: remove in_range from kvm_io_device

2009-06-23 Thread Gregory Haskins

Michael S. Tsirkin wrote:
> Remove in_range from kvm_io_device and ask read/write callbacks, if
> supplied, to perform range checks internally. This allows aliasing
> (mostly for in-kernel virtio), as well as better error handling by
> making it possible to pass errors up to userspace. And it's enough to
> look at the diffstat to see that it's a better API anyway.
>
> While we are at it, document locking rules for kvm_io_device.
>   

Hi Michael,

I just tried to apply this to kvm.git/master, and it blew up really
badly.  What tree should I be using?

-Greg




signature.asc
Description: OpenPGP digital signature

Re: [patch] eventfd - revised interface and cleanups (2nd rev)

2009-06-23 Thread Andrew Morton

On Tue, 23 Jun 2009 15:49:53 -0700 (PDT)
Davide Libenzi  wrote:

> On Tue, 23 Jun 2009, Andrew Morton wrote:
> 
> > On Tue, 23 Jun 2009 14:25:05 -0700 (PDT)
> > Davide Libenzi  wrote:
> > 
> > > On Tue, 23 Jun 2009, Andrew Morton wrote:
> > > 
> > > > That isn't for us to decide. Entire syscalls can be disabled in config.
> > > 
> > > That is not a well defined separate syscall though. It's a member/feature 
> > > of the aiocb.
> > 
> > I don't know what this means, really.
> 
> This is the struct iocb:
> 
> struct iocb {
>   ...
>   u_int32_t   aio_resfd;
> };
> 
> And the only interface to access KAIO is io_submit().
> IMO the end user perceives the KAIO functionality as the full deployment 
> of the iocb struct and the io_submit() accessory.
> Can code not using eventfd work in (AIO && !EVENTFD) mode? Sure.
> It is a kinda confusing configuration from the end user POV IMO.

What's confusing about it?

Most programmers who are using AIO probably don't even know that the
eventd hookup is available.  And even if they did, they might not want
to use it, to remain compatible with older kernels.

I'm still not seeing any compelling reason for unconditionally adding
kernel text which some users don't need.

Maybe there is such a reason, and it hasn't yet been beaten into my
skull.  But I still think it should be done in a separate patch.  One
which comes with a full description of the reasons for the change, btw.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] eventfd - revised interface and cleanups (2nd rev)

2009-06-23 Thread Davide Libenzi

On Tue, 23 Jun 2009, Andrew Morton wrote:

> On Tue, 23 Jun 2009 14:25:05 -0700 (PDT)
> Davide Libenzi  wrote:
> 
> > On Tue, 23 Jun 2009, Andrew Morton wrote:
> > 
> > > That isn't for us to decide. Entire syscalls can be disabled in config.
> > 
> > That is not a well defined separate syscall though. It's a member/feature 
> > of the aiocb.
> 
> I don't know what this means, really.

This is the struct iocb:

struct iocb {
...
u_int32_t   aio_resfd;
};

And the only interface to access KAIO is io_submit().
IMO the end user perceives the KAIO functionality as the full deployment 
of the iocb struct and the io_submit() accessory.
Can code not using eventfd work in (AIO && !EVENTFD) mode? Sure.
It is a kinda confusing configuration from the end user POV IMO.

- Davide

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch] eventfd - revised interface and cleanups (3rd rev)

2009-06-23 Thread Davide Libenzi

The following patch changes the eventfd interface to de-couple the eventfd 
memory context, from the file pointer instance.
Without such change, there is no clean way to racely free handle the 
POLLHUP event sent when the last instance of the file* goes away.
Also, now the internal eventfd APIs are using the eventfd context instead 
of the file*.
Another cleanup this patch does, is making AIO select EVENTFD, instead of 
adding a bunch of empty function stubs inside eventfd.h in order to 
handle the (AIO && !EVENTFD) case.


Signed-off-by: Davide Libenzi 


- Davide


---
 drivers/lguest/lg.h  |2 
 drivers/lguest/lguest_user.c |4 -
 fs/aio.c |   24 ++--
 fs/eventfd.c |  122 ++-
 include/linux/aio.h  |4 -
 include/linux/eventfd.h  |   18 ++
 init/Kconfig |1 
 7 files changed, 129 insertions(+), 46 deletions(-)

Index: linux-2.6.mod/fs/eventfd.c
===
--- linux-2.6.mod.orig/fs/eventfd.c 2009-06-21 16:54:15.0 -0700
+++ linux-2.6.mod/fs/eventfd.c  2009-06-23 14:45:54.0 -0700
@@ -14,35 +14,44 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
+#include 
+#include 
 
 struct eventfd_ctx {
+   struct kref kref;
wait_queue_head_t wqh;
/*
 * Every time that a write(2) is performed on an eventfd, the
 * value of the __u64 being written is added to "count" and a
 * wakeup is performed on "wqh". A read(2) will return the "count"
 * value to userspace, and will reset "count" to zero. The kernel
-* size eventfd_signal() also, adds to the "count" counter and
+* side eventfd_signal() also, adds to the "count" counter and
 * issue a wakeup.
 */
__u64 count;
unsigned int flags;
 };
 
-/*
- * Adds "n" to the eventfd counter "count". Returns "n" in case of
- * success, or a value lower then "n" in case of coutner overflow.
- * This function is supposed to be called by the kernel in paths
- * that do not allow sleeping. In this function we allow the counter
- * to reach the ULLONG_MAX value, and we signal this as overflow
- * condition by returining a POLLERR to poll(2).
+/**
+ * eventfd_signal - Adds @n to the eventfd counter.
+ * @ctx: [in] Pointer to the eventfd context.
+ * @n: [in] Value of the counter to be added to the eventfd internal counter.
+ *  The value cannot be negative.
+ *
+ * This function is supposed to be called by the kernel in paths that do not
+ * allow sleeping. In this function we allow the counter to reach the 
ULLONG_MAX
+ * value, and we signal this as overflow condition by returining a POLLERR
+ * to poll(2).
+ *
+ * Returns @n in case of success, a non-negative number lower than @n in case
+ * of overflow, or the following error codes:
+ *
+ * -EINVAL: The value of @n is negative.
  */
-int eventfd_signal(struct file *file, int n)
+int eventfd_signal(struct eventfd_ctx *ctx, int n)
 {
-   struct eventfd_ctx *ctx = file->private_data;
unsigned long flags;
 
if (n < 0)
@@ -59,9 +68,45 @@ int eventfd_signal(struct file *file, in
 }
 EXPORT_SYMBOL_GPL(eventfd_signal);
 
+static void eventfd_free(struct kref *kref)
+{
+   struct eventfd_ctx *ctx = container_of(kref, struct eventfd_ctx, kref);
+
+   kfree(ctx);
+}
+
+/**
+ * eventfd_ctx_get - Acquires a reference to the internal eventfd context.
+ * @ctx: [in] Pointer to the eventfd context.
+ *
+ * Returns: In case of success, returns a pointer to the eventfd context.
+ */
+struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx)
+{
+   kref_get(&ctx->kref);
+   return ctx;
+}
+EXPORT_SYMBOL_GPL(eventfd_ctx_get);
+
+/**
+ * eventfd_ctx_put - Releases a reference to the internal eventfd context.
+ * @ctx: [in] Pointer to eventfd context.
+ *
+ * The eventfd context reference must have been previously acquired either
+ * with eventfd_ctx_get() or eventfd_ctx_fdget()).
+ */
+void eventfd_ctx_put(struct eventfd_ctx *ctx)
+{
+   kref_put(&ctx->kref, eventfd_free);
+}
+EXPORT_SYMBOL_GPL(eventfd_ctx_put);
+
 static int eventfd_release(struct inode *inode, struct file *file)
 {
-   kfree(file->private_data);
+   struct eventfd_ctx *ctx = file->private_data;
+
+   wake_up_poll(&ctx->wqh, POLLHUP);
+   eventfd_ctx_put(ctx);
return 0;
 }
 
@@ -185,6 +230,16 @@ static const struct file_operations even
.write  = eventfd_write,
 };
 
+/**
+ * eventfd_fget - Acquire a reference of an eventfd file descriptor.
+ * @fd: [in] Eventfd file descriptor.
+ *
+ * Returns a pointer to the eventfd file structure in case of success, or the
+ * following error pointer:
+ *
+ * -EBADF: Invalid @fd file descriptor.
+ * -EINVAL   : The @fd file descriptor is not an eventfd file.
+ */
 struct file *eventfd_fget(int fd)
 {
struct file *file;
@@ -201,

Re: [patch] eventfd - revised interface and cleanups (2nd rev)

2009-06-23 Thread Andrew Morton

On Tue, 23 Jun 2009 14:48:51 -0700 (PDT)
Davide Libenzi  wrote:

> > > This becomes pretty painful when the function calls other functions, for 
> > > which just relays the error code.
> > > Should we be just documenting the error codes introduced by the function 
> > > code, and say that the function returns errors A, B, C plus all the ones 
> > > returned by the called functions X, Y, Z?
> > > If not, it becomes hell in maintaining the comments...
> > 
> > Well.  Don't worry about making rules.  Taste and common sense apply.  "Will
> > it be useful to readers if I explicitly document the return value".  If
> > "yes" then document away.  If "no" then don't.
> 
> Are you OK with the format in the patch below?

Looks great to me.

Obviously the cost of maintaining this level of detail is fairly high,
and the chances of bitrot are also high.  So I wouldn't be expecting
people to document things at this level in general.  But if you're
prepared to maintain this then good for you.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] eventfd - revised interface and cleanups (2nd rev)

2009-06-23 Thread Davide Libenzi

On Tue, 23 Jun 2009, Andrew Morton wrote:

> On Tue, 23 Jun 2009 14:34:50 -0700 (PDT)
> Davide Libenzi  wrote:
> 
> > On Tue, 23 Jun 2009, Andrew Morton wrote:
> > 
> > > > Should functions be describing all the returned error codes, ala man 
> > > > pages?
> > > > 
> > > 
> > > I think so.
> > 
> > This becomes pretty painful when the function calls other functions, for 
> > which just relays the error code.
> > Should we be just documenting the error codes introduced by the function 
> > code, and say that the function returns errors A, B, C plus all the ones 
> > returned by the called functions X, Y, Z?
> > If not, it becomes hell in maintaining the comments...
> 
> Well.  Don't worry about making rules.  Taste and common sense apply.  "Will
> it be useful to readers if I explicitly document the return value".  If
> "yes" then document away.  If "no" then don't.

Are you OK with the format in the patch below?


- Davide


---
 drivers/lguest/lg.h  |2 
 drivers/lguest/lguest_user.c |4 -
 fs/aio.c |   24 ++--
 fs/eventfd.c |  122 ++-
 include/linux/aio.h  |4 -
 include/linux/eventfd.h  |   18 ++
 init/Kconfig |1 
 7 files changed, 129 insertions(+), 46 deletions(-)

Index: linux-2.6.mod/fs/eventfd.c
===
--- linux-2.6.mod.orig/fs/eventfd.c 2009-06-21 16:54:15.0 -0700
+++ linux-2.6.mod/fs/eventfd.c  2009-06-23 14:45:54.0 -0700
@@ -14,35 +14,44 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
+#include 
+#include 
 
 struct eventfd_ctx {
+   struct kref kref;
wait_queue_head_t wqh;
/*
 * Every time that a write(2) is performed on an eventfd, the
 * value of the __u64 being written is added to "count" and a
 * wakeup is performed on "wqh". A read(2) will return the "count"
 * value to userspace, and will reset "count" to zero. The kernel
-* size eventfd_signal() also, adds to the "count" counter and
+* side eventfd_signal() also, adds to the "count" counter and
 * issue a wakeup.
 */
__u64 count;
unsigned int flags;
 };
 
-/*
- * Adds "n" to the eventfd counter "count". Returns "n" in case of
- * success, or a value lower then "n" in case of coutner overflow.
- * This function is supposed to be called by the kernel in paths
- * that do not allow sleeping. In this function we allow the counter
- * to reach the ULLONG_MAX value, and we signal this as overflow
- * condition by returining a POLLERR to poll(2).
+/**
+ * eventfd_signal - Adds @n to the eventfd counter.
+ * @ctx: [in] Pointer to the eventfd context.
+ * @n: [in] Value of the counter to be added to the eventfd internal counter.
+ *  The value cannot be negative.
+ *
+ * This function is supposed to be called by the kernel in paths that do not
+ * allow sleeping. In this function we allow the counter to reach the 
ULLONG_MAX
+ * value, and we signal this as overflow condition by returining a POLLERR
+ * to poll(2).
+ *
+ * Returns @n in case of success, a non-negative number lower than @n in case
+ * of overflow, or the following error codes:
+ *
+ * -EINVAL: The value of @n is negative.
  */
-int eventfd_signal(struct file *file, int n)
+int eventfd_signal(struct eventfd_ctx *ctx, int n)
 {
-   struct eventfd_ctx *ctx = file->private_data;
unsigned long flags;
 
if (n < 0)
@@ -59,9 +68,45 @@ int eventfd_signal(struct file *file, in
 }
 EXPORT_SYMBOL_GPL(eventfd_signal);
 
+static void eventfd_free(struct kref *kref)
+{
+   struct eventfd_ctx *ctx = container_of(kref, struct eventfd_ctx, kref);
+
+   kfree(ctx);
+}
+
+/**
+ * eventfd_ctx_get - Acquires a reference to the internal eventfd context.
+ * @ctx: [in] Pointer to the eventfd context.
+ *
+ * Returns: In case of success, returns a pointer to the eventfd context.
+ */
+struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx)
+{
+   kref_get(&ctx->kref);
+   return ctx;
+}
+EXPORT_SYMBOL_GPL(eventfd_ctx_get);
+
+/**
+ * eventfd_ctx_put - Releases a reference to the internal eventfd context.
+ * @ctx: [in] Pointer to eventfd context.
+ *
+ * The eventfd context reference must have been previously acquired either
+ * with eventfd_ctx_get() or eventfd_ctx_fdget()).
+ */
+void eventfd_ctx_put(struct eventfd_ctx *ctx)
+{
+   kref_put(&ctx->kref, eventfd_free);
+}
+EXPORT_SYMBOL_GPL(eventfd_ctx_put);
+
 static int eventfd_release(struct inode *inode, struct file *file)
 {
-   kfree(file->private_data);
+   struct eventfd_ctx *ctx = file->private_data;
+
+   wake_up_poll(&ctx->wqh, POLLHUP);
+   eventfd_ctx_put(ctx);
return 0;
 }
 
@@ -185,6 +230,16 @@ static const struct file_operations even
.write  = eventfd_write,
 };
 
+/**
+ * eventfd_fget - Acquire a reference of an e

Re: [patch] eventfd - revised interface and cleanups (2nd rev)

2009-06-23 Thread Andrew Morton

On Tue, 23 Jun 2009 14:34:50 -0700 (PDT)
Davide Libenzi  wrote:

> On Tue, 23 Jun 2009, Andrew Morton wrote:
> 
> > > Should functions be describing all the returned error codes, ala man 
> > > pages?
> > > 
> > 
> > I think so.
> 
> This becomes pretty painful when the function calls other functions, for 
> which just relays the error code.
> Should we be just documenting the error codes introduced by the function 
> code, and say that the function returns errors A, B, C plus all the ones 
> returned by the called functions X, Y, Z?
> If not, it becomes hell in maintaining the comments...

Well.  Don't worry about making rules.  Taste and common sense apply.  "Will
it be useful to readers if I explicitly document the return value".  If
"yes" then document away.  If "no" then don't.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] eventfd - revised interface and cleanups (2nd rev)

2009-06-23 Thread Andrew Morton

On Tue, 23 Jun 2009 14:25:05 -0700 (PDT)
Davide Libenzi  wrote:

> On Tue, 23 Jun 2009, Andrew Morton wrote:
> 
> > That isn't for us to decide. Entire syscalls can be disabled in config.
> 
> That is not a well defined separate syscall though. It's a member/feature 
> of the aiocb.

I don't know what this means, really.

AIO eventfd support is a relatively recently added enhancement to AIO,
is it not?  Applications which continue to use the pre-May07 AIO
features will continue to work as before (they darn well better).  So
for such applications, AIO=y/EVENTFD=n kernels are usable and useful,
and eliminating this option is a loss?

Either way, I believe that this change should be unbundled from the
unrelated KVM one so we can handle it separately.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] eventfd - revised interface and cleanups (2nd rev)

2009-06-23 Thread Davide Libenzi

On Tue, 23 Jun 2009, Andrew Morton wrote:

> > Should functions be describing all the returned error codes, ala man pages?
> > 
> 
> I think so.

This becomes pretty painful when the function calls other functions, for 
which just relays the error code.
Should we be just documenting the error codes introduced by the function 
code, and say that the function returns errors A, B, C plus all the ones 
returned by the called functions X, Y, Z?
If not, it becomes hell in maintaining the comments...

- Davide

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] eventfd - revised interface and cleanups (2nd rev)

2009-06-23 Thread Davide Libenzi

On Tue, 23 Jun 2009, Andrew Morton wrote:

> > > > +struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx)
> > > > +{
> > > > +   kref_get(&ctx->kref);
> > > > +   return ctx;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(eventfd_ctx_get);
> > > 
> > > doesn't match the code.
> 
> You appear to have not seen the above sentence.

I saw that and have already fixed it. Sorry I missed the ACK.



> yeah, I never trust people.  You might lose the email or jump on a
> plane and disappear for three weeks, then it all gets forgotten about
> and lost.

Jumping on a plane. Check. Disappearing for 6 weeks. Check. ;)


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] eventfd - revised interface and cleanups (2nd rev)

2009-06-23 Thread Davide Libenzi

On Tue, 23 Jun 2009, Andrew Morton wrote:

> That isn't for us to decide. Entire syscalls can be disabled in config.

That is not a well defined separate syscall though. It's a member/feature 
of the aiocb.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] eventfd - revised interface and cleanups (2nd rev)

2009-06-23 Thread Andrew Morton

On Tue, 23 Jun 2009 14:03:22 -0700 (PDT)
Davide Libenzi  wrote:

> On Tue, 23 Jun 2009, Andrew Morton wrote:
> 
> > On Tue, 23 Jun 2009 12:25:36 -0700 (PDT)
> > Davide Libenzi  wrote:
> > 
> > > The following patch changes the eventfd interface to de-couple the 
> > > eventfd 
> > > memory context, from the file pointer instance.
> > > Without such change, there is no clean way to racely free handle the 
> > > POLLHUP event sent when the last instance of the file* goes away.
> > > Also, now the internal eventfd APIs are using the eventfd context instead 
> > > of the file*.
> > > Another cleanup this patch does, is making AIO select EVENTFD, instead of 
> > > adding a bunch of empty function stubs inside eventfd.h in order to 
> > > handle the (AIO && !EVENTFD) case.
> > > 
> > > ...
> > >
> > > +/**
> > > + * eventfd_ctx_get - Acquires a reference to the internal eventfd 
> > > context.
> > > + * @ctx: [in] Pointer to the eventfd context.
> > > + *
> > > + * Returns: In case of success, returns a pointer to the eventfd context,
> > > + *  otherwise a proper error code.
> > 
> > The description of the return value
> 
> Should functions be describing all the returned error codes, ala man pages?
> 

I think so.

> 
> > > + */
> > > +struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx)
> > > +{
> > > + kref_get(&ctx->kref);
> > > + return ctx;
> > > +}
> > > +EXPORT_SYMBOL_GPL(eventfd_ctx_get);
> > 
> > doesn't match the code.

You appear to have not seen the above sentence.

> > Also...
> > 
> > > + * Returns: A pointer to the eventfd file structure in case of success, 
> > > or a
> > > + *  proper error pointer in case of failure.
> > 
> > 
> > > + * Returns: In case of success, it returns a pointer to the internal 
> > > eventfd
> > > + *  context, otherwise a proper error code.
> > > + */
> > 
> > I'm unsure what the word "proper" means in this context.
> > 
> > The term "proper error pointer" is understandable enough - something
> > you run IS_ERR() against.  "error pointer" would suffice.
> > 
> > But the term "proper error code" is getting a bit remote from reality.
> > 
> > Unfortunately the kernel doesn't have a simple and agreed-to term for
> > an ERR_PTR() thingy.  Perhaps we should invent one.  "err_ptr"?
> 
> OK, but you tricked me once again :)
> You posted your comments/changes while you merged the old version in -mm 
> already.

yeah, I never trust people.  You might lose the email or jump on a
plane and disappear for three weeks, then it all gets forgotten about
and lost.

If the code doesn't have any apparent showstoppers I'll often merge it
with a note that it isn't finalised.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] eventfd - revised interface and cleanups (2nd rev)

2009-06-23 Thread Andrew Morton

On Tue, 23 Jun 2009 13:59:07 -0700 (PDT)
Davide Libenzi  wrote:

> On Tue, 23 Jun 2009, Andrew Morton wrote:
> 
> > On Tue, 23 Jun 2009 12:25:36 -0700 (PDT)
> > Davide Libenzi  wrote:
> > 
> > > Another cleanup this patch does, is making AIO select EVENTFD, instead of 
> > > adding a bunch of empty function stubs inside eventfd.h in order to 
> > > handle the (AIO && !EVENTFD) case.
> > 
> > Given that we're trying to squeak this patch into 2.6.31, it's quite
> > unfortunate to include unrelated changes.  Especially ones which
> > involve the always-problematic "select".  Although this one looks less
> > than usually deadly due to the simple dependencies of AIO and eventfd.
> > 
> > However...
> > 
> > Is this a good way of fixing the (AIO && !EVENTFD) case?  Surely if the
> > developer selected this combination, he doesn't want to carry the
> > overhead of including eventfd.  IOW, if AIO can work acceptably without
> > eventfd being compiled into the kernel then adding the stubs is a
> > superior solution.
> 
> IMO when someone says "AIO included in the kernel", should get the whole 
> AIO functionality and not part of it.
> People already started using KAIO+eventfd, and a (AIO && !EVENTFD) config 
> will make their code fail.

That isn't for us to decide. Entire syscalls can be disabled in config.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] eventfd - revised interface and cleanups (2nd rev)

2009-06-23 Thread Davide Libenzi

On Tue, 23 Jun 2009, Andrew Morton wrote:

> On Tue, 23 Jun 2009 12:25:36 -0700 (PDT)
> Davide Libenzi  wrote:
> 
> > The following patch changes the eventfd interface to de-couple the eventfd 
> > memory context, from the file pointer instance.
> > Without such change, there is no clean way to racely free handle the 
> > POLLHUP event sent when the last instance of the file* goes away.
> > Also, now the internal eventfd APIs are using the eventfd context instead 
> > of the file*.
> > Another cleanup this patch does, is making AIO select EVENTFD, instead of 
> > adding a bunch of empty function stubs inside eventfd.h in order to 
> > handle the (AIO && !EVENTFD) case.
> > 
> > ...
> >
> > +/**
> > + * eventfd_ctx_get - Acquires a reference to the internal eventfd context.
> > + * @ctx: [in] Pointer to the eventfd context.
> > + *
> > + * Returns: In case of success, returns a pointer to the eventfd context,
> > + *  otherwise a proper error code.
> 
> The description of the return value

Should functions be describing all the returned error codes, ala man pages?



> > + */
> > +struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx)
> > +{
> > +   kref_get(&ctx->kref);
> > +   return ctx;
> > +}
> > +EXPORT_SYMBOL_GPL(eventfd_ctx_get);
> 
> doesn't match the code.
> 
> Also...
> 
> > + * Returns: A pointer to the eventfd file structure in case of success, or 
> > a
> > + *  proper error pointer in case of failure.
> 
> 
> > + * Returns: In case of success, it returns a pointer to the internal 
> > eventfd
> > + *  context, otherwise a proper error code.
> > + */
> 
> I'm unsure what the word "proper" means in this context.
> 
> The term "proper error pointer" is understandable enough - something
> you run IS_ERR() against.  "error pointer" would suffice.
> 
> But the term "proper error code" is getting a bit remote from reality.
> 
> Unfortunately the kernel doesn't have a simple and agreed-to term for
> an ERR_PTR() thingy.  Perhaps we should invent one.  "err_ptr"?

OK, but you tricked me once again :)
You posted your comments/changes while you merged the old version in -mm 
already.



- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] eventfd - revised interface and cleanups (2nd rev)

2009-06-23 Thread Davide Libenzi

On Tue, 23 Jun 2009, Andrew Morton wrote:

> On Tue, 23 Jun 2009 12:25:36 -0700 (PDT)
> Davide Libenzi  wrote:
> 
> > Another cleanup this patch does, is making AIO select EVENTFD, instead of 
> > adding a bunch of empty function stubs inside eventfd.h in order to 
> > handle the (AIO && !EVENTFD) case.
> 
> Given that we're trying to squeak this patch into 2.6.31, it's quite
> unfortunate to include unrelated changes.  Especially ones which
> involve the always-problematic "select".  Although this one looks less
> than usually deadly due to the simple dependencies of AIO and eventfd.
> 
> However...
> 
> Is this a good way of fixing the (AIO && !EVENTFD) case?  Surely if the
> developer selected this combination, he doesn't want to carry the
> overhead of including eventfd.  IOW, if AIO can work acceptably without
> eventfd being compiled into the kernel then adding the stubs is a
> superior solution.

IMO when someone says "AIO included in the kernel", should get the whole 
AIO functionality and not part of it.
People already started using KAIO+eventfd, and a (AIO && !EVENTFD) config 
will make their code fail.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] eventfd - revised interface and cleanups (2nd rev)

2009-06-23 Thread Andrew Morton

On Tue, 23 Jun 2009 12:25:36 -0700 (PDT)
Davide Libenzi  wrote:

> The following patch changes the eventfd interface to de-couple the eventfd 
> memory context, from the file pointer instance.
> Without such change, there is no clean way to racely free handle the 
> POLLHUP event sent when the last instance of the file* goes away.
> Also, now the internal eventfd APIs are using the eventfd context instead 
> of the file*.
> Another cleanup this patch does, is making AIO select EVENTFD, instead of 
> adding a bunch of empty function stubs inside eventfd.h in order to 
> handle the (AIO && !EVENTFD) case.
> 
> 
> ...
>
> +/**
> + * eventfd_ctx_get - Acquires a reference to the internal eventfd context.
> + * @ctx: [in] Pointer to the eventfd context.
> + *
> + * Returns: In case of success, returns a pointer to the eventfd context,
> + *  otherwise a proper error code.

The description of the return value

> + */
> +struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx)
> +{
> + kref_get(&ctx->kref);
> + return ctx;
> +}
> +EXPORT_SYMBOL_GPL(eventfd_ctx_get);

doesn't match the code.


Also...

> + * Returns: A pointer to the eventfd file structure in case of success, or a
> + *  proper error pointer in case of failure.


> + * Returns: In case of success, it returns a pointer to the internal eventfd
> + *  context, otherwise a proper error code.
> + */

I'm unsure what the word "proper" means in this context.

The term "proper error pointer" is understandable enough - something
you run IS_ERR() against.  "error pointer" would suffice.

But the term "proper error code" is getting a bit remote from reality.


Unfortunately the kernel doesn't have a simple and agreed-to term for
an ERR_PTR() thingy.  Perhaps we should invent one.  "err_ptr"?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] eventfd - revised interface and cleanups (2nd rev)

2009-06-23 Thread Andrew Morton

On Tue, 23 Jun 2009 12:25:36 -0700 (PDT)
Davide Libenzi  wrote:

> Another cleanup this patch does, is making AIO select EVENTFD, instead of 
> adding a bunch of empty function stubs inside eventfd.h in order to 
> handle the (AIO && !EVENTFD) case.

Given that we're trying to squeak this patch into 2.6.31, it's quite
unfortunate to include unrelated changes.  Especially ones which
involve the always-problematic "select".  Although this one looks less
than usually deadly due to the simple dependencies of AIO and eventfd.

However...

Is this a good way of fixing the (AIO && !EVENTFD) case?  Surely if the
developer selected this combination, he doesn't want to carry the
overhead of including eventfd.  IOW, if AIO can work acceptably without
eventfd being compiled into the kernel then adding the stubs is a
superior solution.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] eventfd - revised interface and cleanups (2nd rev)

2009-06-23 Thread Davide Libenzi

On Tue, 23 Jun 2009, Andrew Morton wrote:

> On Tue, 23 Jun 2009 12:25:36 -0700 (PDT)
> Davide Libenzi  wrote:
> 
> > The following patch changes the eventfd interface to de-couple the eventfd 
> > memory context, from the file pointer instance.
> > Without such change, there is no clean way to racely free handle the 
> > POLLHUP event sent when the last instance of the file* goes away.
> > Also, now the internal eventfd APIs are using the eventfd context instead 
> > of the file*.
> > Another cleanup this patch does, is making AIO select EVENTFD, instead of 
> > adding a bunch of empty function stubs inside eventfd.h in order to 
> > handle the (AIO && !EVENTFD) case.
> > 
> 
> If we really want to squeeze this into 2.6.31 then it would be helpful
> to have justification in the changelog, please.  I see that some KVM
> feature needs it, but what and why and why now, etc?

The patch in -next added the ability to have waiters to be notified when 
the last instance of the file* is dropped (that is, on ->release).
But it is not possible for waiters to handle the POLLHUP event in a 
racy-free way, unless the eventfd memory context is de-coupled from the 
file*.
The next patches from gregory will use eventfd (and the POLLHUP handling) 
to implement the IRQfd code inside KVM.

> What's going on here?

The POLLHUP patch in -next is not sufficent. I based the patch over 
mainline, but if you guys want I can rebase it over -next.

- Davide

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] eventfd - revised interface and cleanups (2nd rev)

2009-06-23 Thread Andrew Morton

On Tue, 23 Jun 2009 12:25:36 -0700 (PDT)
Davide Libenzi  wrote:

> The following patch changes the eventfd interface to de-couple the eventfd 
> memory context, from the file pointer instance.
> Without such change, there is no clean way to racely free handle the 
> POLLHUP event sent when the last instance of the file* goes away.
> Also, now the internal eventfd APIs are using the eventfd context instead 
> of the file*.
> Another cleanup this patch does, is making AIO select EVENTFD, instead of 
> adding a bunch of empty function stubs inside eventfd.h in order to 
> handle the (AIO && !EVENTFD) case.
> 

If we really want to squeeze this into 2.6.31 then it would be helpful
to have justification in the changelog, please.  I see that some KVM
feature needs it, but what and why and why now, etc?

The patch conflicts with similar changes int he KVM tree, but that'll
just ruin sfr's life for a day.


Your patch has:


> ...
>  static int eventfd_release(struct inode *inode, struct file *file)
>  {
> - kfree(file->private_data);
> + struct eventfd_ctx *ctx = file->private_data;
> +
> + wake_up_poll(&ctx->wqh, POLLHUP);
> + eventfd_ctx_put(ctx);
>   return 0;
>  }

but the patch in linux-next has

static int eventfd_release(struct inode *inode, struct file *file)
{
struct eventfd_ctx *ctx = file->private_data;

/*
 * No need to hold the lock here, since we are on the file cleanup
 * path and the ones still attached to the wait queue will be
 * serialized by wake_up_locked_poll().
 */
wake_up_locked_poll(&ctx->wqh, POLLHUP);
kfree(ctx);
return 0;
}

which looks quite different (and has a nice comment!)

What's going on here?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch] eventfd - revised interface and cleanups (2nd rev)

2009-06-23 Thread Davide Libenzi

The following patch changes the eventfd interface to de-couple the eventfd 
memory context, from the file pointer instance.
Without such change, there is no clean way to racely free handle the 
POLLHUP event sent when the last instance of the file* goes away.
Also, now the internal eventfd APIs are using the eventfd context instead 
of the file*.
Another cleanup this patch does, is making AIO select EVENTFD, instead of 
adding a bunch of empty function stubs inside eventfd.h in order to 
handle the (AIO && !EVENTFD) case.



Signed-off-by: Davide Libenzi 


- Davide


---
 drivers/lguest/lg.h  |2 
 drivers/lguest/lguest_user.c |4 -
 fs/aio.c |   24 ++--
 fs/eventfd.c |  115 ++-
 include/linux/aio.h  |4 -
 include/linux/eventfd.h  |   18 ++
 init/Kconfig |1 
 7 files changed, 122 insertions(+), 46 deletions(-)

Index: linux-2.6.mod/fs/eventfd.c
===
--- linux-2.6.mod.orig/fs/eventfd.c 2009-06-21 16:54:15.0 -0700
+++ linux-2.6.mod/fs/eventfd.c  2009-06-23 10:57:58.0 -0700
@@ -14,35 +14,41 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
+#include 
+#include 
 
 struct eventfd_ctx {
+   struct kref kref;
wait_queue_head_t wqh;
/*
 * Every time that a write(2) is performed on an eventfd, the
 * value of the __u64 being written is added to "count" and a
 * wakeup is performed on "wqh". A read(2) will return the "count"
 * value to userspace, and will reset "count" to zero. The kernel
-* size eventfd_signal() also, adds to the "count" counter and
+* side eventfd_signal() also, adds to the "count" counter and
 * issue a wakeup.
 */
__u64 count;
unsigned int flags;
 };
 
-/*
- * Adds "n" to the eventfd counter "count". Returns "n" in case of
- * success, or a value lower then "n" in case of coutner overflow.
- * This function is supposed to be called by the kernel in paths
- * that do not allow sleeping. In this function we allow the counter
- * to reach the ULLONG_MAX value, and we signal this as overflow
- * condition by returining a POLLERR to poll(2).
+/**
+ * eventfd_signal - Adds @n to the eventfd counter.
+ * @ctx: [in] Pointer to the eventfd context.
+ * @n: [in] Value of the counter to be added to the eventfd internal counter.
+ *
+ * This function is supposed to be called by the kernel in paths that do not
+ * allow sleeping. In this function we allow the counter to reach the 
ULLONG_MAX
+ * value, and we signal this as overflow condition by returining a POLLERR
+ * to poll(2).
+ *
+ * Returns: In case of success, it returns @n, otherwise (in case of overflow
+ *  of the eventfd 64bit internal counter) a value lower than @n.
  */
-int eventfd_signal(struct file *file, int n)
+int eventfd_signal(struct eventfd_ctx *ctx, int n)
 {
-   struct eventfd_ctx *ctx = file->private_data;
unsigned long flags;
 
if (n < 0)
@@ -59,9 +65,48 @@ int eventfd_signal(struct file *file, in
 }
 EXPORT_SYMBOL_GPL(eventfd_signal);
 
+static void eventfd_free(struct kref *kref)
+{
+   struct eventfd_ctx *ctx = container_of(kref, struct eventfd_ctx, kref);
+
+   kfree(ctx);
+}
+
+/**
+ * eventfd_ctx_get - Acquires a reference to the internal eventfd context.
+ * @ctx: [in] Pointer to the eventfd context.
+ *
+ * Returns: In case of success, returns a pointer to the eventfd context,
+ *  otherwise a proper error code.
+ */
+struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx)
+{
+   kref_get(&ctx->kref);
+   return ctx;
+}
+EXPORT_SYMBOL_GPL(eventfd_ctx_get);
+
+/**
+ * eventfd_ctx_put - Releases a reference to the internal eventfd context.
+ * @ctx: [in] Pointer to eventfd context.
+ *
+ * The eventfd context reference must have been previously acquired either
+ * with eventfd_ctx_get() or eventfd_ctx_fdget()).
+ *
+ * Returns: Nothing.
+ */
+void eventfd_ctx_put(struct eventfd_ctx *ctx)
+{
+   kref_put(&ctx->kref, eventfd_free);
+}
+EXPORT_SYMBOL_GPL(eventfd_ctx_put);
+
 static int eventfd_release(struct inode *inode, struct file *file)
 {
-   kfree(file->private_data);
+   struct eventfd_ctx *ctx = file->private_data;
+
+   wake_up_poll(&ctx->wqh, POLLHUP);
+   eventfd_ctx_put(ctx);
return 0;
 }
 
@@ -185,6 +230,13 @@ static const struct file_operations even
.write  = eventfd_write,
 };
 
+/**
+ * eventfd_fget - Acquire a reference of an eventfd file descriptor.
+ * @fd: [in] Eventfd file descriptor.
+ *
+ * Returns: A pointer to the eventfd file structure in case of success, or a
+ *  proper error pointer in case of failure.
+ */
 struct file *eventfd_fget(int fd)
 {
struct file *file;
@@ -201,6 +253,44 @@ struct file *eventfd_fget(int fd)
 }
 EXPORT_SYMBOL_GPL(eventfd_fget);

KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU ioctl paths

2009-06-23 Thread Marcelo Tosatti


Correct missing locking in a few places in x86's vm_ioctl handling path.

Signed-off-by: Marcelo Tosatti 

Index: kvm/arch/x86/kvm/x86.c
===
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -1951,19 +1951,25 @@ static int kvm_vm_ioctl_set_irqchip(stru
r = 0;
switch (chip->chip_id) {
case KVM_IRQCHIP_PIC_MASTER:
+   spin_lock(&pic_irqchip(kvm)->lock);
memcpy(&pic_irqchip(kvm)->pics[0],
&chip->chip.pic,
sizeof(struct kvm_pic_state));
+   spin_unlock(&pic_irqchip(kvm)->lock);
break;
case KVM_IRQCHIP_PIC_SLAVE:
+   spin_lock(&pic_irqchip(kvm)->lock);
memcpy(&pic_irqchip(kvm)->pics[1],
&chip->chip.pic,
sizeof(struct kvm_pic_state));
+   spin_unlock(&pic_irqchip(kvm)->lock);
break;
case KVM_IRQCHIP_IOAPIC:
+   mutex_lock(&kvm->irq_lock);
memcpy(ioapic_irqchip(kvm),
&chip->chip.ioapic,
sizeof(struct kvm_ioapic_state));
+   mutex_unlock(&kvm->irq_lock);
break;
default:
r = -EINVAL;
@@ -1977,7 +1983,9 @@ static int kvm_vm_ioctl_get_pit(struct k
 {
int r = 0;
 
+   mutex_lock(&kvm->arch.vpit->pit_state.lock);
memcpy(ps, &kvm->arch.vpit->pit_state, sizeof(struct kvm_pit_state));
+   mutex_unlock(&kvm->arch.vpit->pit_state.lock);
return r;
 }
 
@@ -1985,8 +1993,10 @@ static int kvm_vm_ioctl_set_pit(struct k
 {
int r = 0;
 
+   mutex_lock(&kvm->arch.vpit->pit_state.lock);
memcpy(&kvm->arch.vpit->pit_state, ps, sizeof(struct kvm_pit_state));
kvm_pit_load_count(kvm, 0, ps->channels[0].count);
+   mutex_unlock(&kvm->arch.vpit->pit_state.lock);
return r;
 }
 
@@ -1995,7 +2005,9 @@ static int kvm_vm_ioctl_reinject(struct 
 {
if (!kvm->arch.vpit)
return -ENXIO;
+   mutex_lock(&kvm->arch.vpit->pit_state.lock);
kvm->arch.vpit->pit_state.pit_timer.reinject = control->pit_reinject;
+   mutex_unlock(&kvm->arch.vpit->pit_state.lock);
return 0;
 }
 
Index: kvm/arch/x86/kvm/i8254.c
===
--- kvm.orig/arch/x86/kvm/i8254.c
+++ kvm/arch/x86/kvm/i8254.c
@@ -342,9 +342,7 @@ static void pit_load_count(struct kvm *k
 
 void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val)
 {
-   mutex_lock(&kvm->arch.vpit->pit_state.lock);
pit_load_count(kvm, channel, val);
-   mutex_unlock(&kvm->arch.vpit->pit_state.lock);
 }
 
 static inline struct kvm_pit *dev_to_pit(struct kvm_io_device *dev)
Index: kvm/virt/kvm/kvm_main.c
===
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -2248,10 +2248,12 @@ static long kvm_vm_ioctl(struct file *fi
 #ifdef CONFIG_KVM_APIC_ARCHITECTURE
case KVM_SET_BOOT_CPU_ID:
r = 0;
+   mutex_lock(&kvm->lock);
if (atomic_read(&kvm->online_vcpus) != 0)
r = -EBUSY;
else
kvm->bsp_vcpu_id = arg;
+   mutex_unlock(&kvm->lock);
break;
 #endif
default:
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] eventfd - revised interface and cleanups

2009-06-23 Thread Davide Libenzi

On Tue, 23 Jun 2009, Gregory Haskins wrote:

> I think the lack of a way to got from a file* to a ctx* (or back) might
> be a problem.  I am not an expert, but if I understood the gist of what
> Al Viro (cc'd) was saying early on, an fd can change meaning out from
> under you without warning.
> 
> With the code the way it is now, I would need to call both
> eventfd_fget() and eventfd_ctx_fdget() to fully acquire state (unless I
> am missing something).  I would think that is a race and I am not
> guaranteed to get the same object.  Can we also have a way to convert
> one reference to the other?  I am thinking
> 
> struct eventfd_ctx *eventfd_ctx_fget(struct file *)
> 
> (or similar) would be sufficient.

You're right. That function was available in the previous patch, but then 
I dropped it because I thought the eventfd code could be simplified. But 
yes, it's needed.



- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] eventfd - revised interface and cleanups

2009-06-23 Thread Gregory Haskins

Davide Libenzi wrote:
> The following patch changes the eventfd interface to de-couple the eventfd 
> memory context, from the file pointer instance.
> Without such change, there is no clean way to racely free handle the 
> POLLHUP event sent when the last instance of the file* goes away.
> Also, now the internal eventfd APIs are using the eventfd context instead 
> of the file*.
> Another cleanup this patch does, is making AIO select EVENTFD, instead of 
> adding a bunch of empty function stubs inside eventfd.h.
>
> Andrew, this better go via Avi and the KVM tree, since they have patches 
> that will be based on the new interface.
>
>
>
> Signed-off-by: Davide Libenzi 
>
>
> - Davide
>
>
> ---
>  drivers/lguest/lg.h  |2 
>  drivers/lguest/lguest_user.c |4 -
>  fs/aio.c |   24 ++
>  fs/eventfd.c |  101 
> ++-
>  include/linux/aio.h  |4 -
>  include/linux/eventfd.h  |   17 ++-
>  init/Kconfig |1 
>  7 files changed, 108 insertions(+), 45 deletions(-)
>
> Index: linux-2.6.mod/fs/eventfd.c
> ===
> --- linux-2.6.mod.orig/fs/eventfd.c   2009-06-21 16:54:15.0 -0700
> +++ linux-2.6.mod/fs/eventfd.c2009-06-23 09:34:42.0 -0700
> @@ -17,32 +17,38 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  struct eventfd_ctx {
> + struct kref kref;
>   wait_queue_head_t wqh;
>   /*
>* Every time that a write(2) is performed on an eventfd, the
>* value of the __u64 being written is added to "count" and a
>* wakeup is performed on "wqh". A read(2) will return the "count"
>* value to userspace, and will reset "count" to zero. The kernel
> -  * size eventfd_signal() also, adds to the "count" counter and
> +  * side eventfd_signal() also, adds to the "count" counter and
>* issue a wakeup.
>*/
>   __u64 count;
>   unsigned int flags;
>  };
>  
> -/*
> - * Adds "n" to the eventfd counter "count". Returns "n" in case of
> - * success, or a value lower then "n" in case of coutner overflow.
> - * This function is supposed to be called by the kernel in paths
> - * that do not allow sleeping. In this function we allow the counter
> - * to reach the ULLONG_MAX value, and we signal this as overflow
> - * condition by returining a POLLERR to poll(2).
> +/**
> + * eventfd_signal - Adds @n to the eventfd counter. This function is
> + *  supposed to be called by the kernel in paths that do not
> + *  allow sleeping. In this function we allow the counter
> + *  to reach the ULLONG_MAX value, and we signal this as
> + *  overflow condition by returining a POLLERR to poll(2).
> + *
> + * @ctx: [in] Pointer to the eventfd context.
> + * @n: [in] Value of the counter to be added to the eventfd internal counter.
> + *
> + * Returns: In case of success, it returns @n, otherwise (in case of overflow
> + *  of the eventfd 64bit internal counter) a value lower than @n.
>   */
> -int eventfd_signal(struct file *file, int n)
> +int eventfd_signal(struct eventfd_ctx *ctx, int n)
>  {
> - struct eventfd_ctx *ctx = file->private_data;
>   unsigned long flags;
>  
>   if (n < 0)
> @@ -59,9 +65,49 @@ int eventfd_signal(struct file *file, in
>  }
>  EXPORT_SYMBOL_GPL(eventfd_signal);
>  
> +static void eventfd_free(struct kref *kref)
> +{
> + struct eventfd_ctx *ctx = container_of(kref, struct eventfd_ctx, kref);
> +
> + kfree(ctx);
> +}
> +
> +/**
> + * eventfd_ctx_get - Acquires a reference to the internal eventfd context.
> + *
> + * @ctx: [in] Pointer to the eventfd context.
> + *
> + * Returns: In case of success, returns a pointer to the eventfd context,
> + *  otherwise a proper error code.
> + */
> +struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx)
> +{
> + kref_get(&ctx->kref);
> + return ctx;
> +}
> +EXPORT_SYMBOL_GPL(eventfd_ctx_get);
> +
> +/**
> + * eventfd_ctx_put - Releases a reference to the internal eventfd context
> + *   (previously acquired either with eventfd_ctx_get() or
> + *   eventfd_ctx_fdget()).
> + *
> + * @ctx: [in] Pointer to eventfd context.
> + *
> + * Returns: Nothing.
> + */
> +void eventfd_ctx_put(struct eventfd_ctx *ctx)
> +{
> + kref_put(&ctx->kref, eventfd_free);
> +}
> +EXPORT_SYMBOL_GPL(eventfd_ctx_put);
> +
>  static int eventfd_release(struct inode *inode, struct file *file)
>  {
> - kfree(file->private_data);
> + struct eventfd_ctx *ctx = file->private_data;
> +
> + wake_up_poll(&ctx->wqh, POLLHUP);
> + eventfd_ctx_put(ctx);
>   return 0;
>  }
>  
> @@ -185,6 +231,14 @@ static const struct file_operations even
>   .write  = eventfd_write,
>  };
>  
> +/**
> + * eventfd_fget - Acquire a reference of an eventfd file descriptor.

virtio_blk fails for rhel5.3 guest with LVM: kernel panic

2009-06-23 Thread sudhir kumar

I see that Rhel5.3 32 and 64 bit guests fail to boot with virtio for
block device. The
guest can not find the root filesystem. The guest is using LVM. As per
the linux-kvm wiki  instructions I did change device.map and booted
with if=virtio.
The wiki link is http://www.linux-kvm.org/page/Boot_from_virtio_block_device

qemu command:
# qemu-system-x86_64 -drive file=rhel5-32.raw,if=virtio,boot=on -m 2048 -smp 4
-net nic,model=rtl8139,macaddr=00:FF:FE:00:00:64 -net
tap,script=/root/qemu-ifup-latest -name 32virtio -vnc :2 -watchdog i6300esb
-watchdog-action reset &
The image was installed using if=ide.

The messages before panic shows that the guest was unable to mount
root file system as below.
"Volume group Volgroup00 not found
mount: could not find filesystem /dev/root
setuproot: moving /dev failed: no such file or directory
setuproot: mounting /proc failed: no such file or directory
setuproot: mounting /sys failed: no such file or directory
switchroot: mount failed: no such file or directory
kernle panic- not syncing: Attempted to kill init "

I see similar messages for 64 bit guest as well.

This is the information from within guest:
[r...@ichigo-dom101 ~]# mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/hda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
[r...@ichigo-dom101 ~]# fdisk -l

Disk /dev/hda: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot  Start End  Blocks   Id  System
/dev/hda1   *   1  13  104391   83  Linux
/dev/hda2  14130510377990   8e  Linux LVM
[r...@ichigo-dom101 ~]# cat /boot/grub/device.map
# this device map was generated by anaconda
(hd0) /dev/vda

[r...@ichigo-dom101 ~]# cat /boot/grub/menu.lst
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#  all kernel and initrd paths are relative to /boot/, eg.
#  root (hd0,0)
#  kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00
#  initrd /initrd-version.img
#boot=/dev/hda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux Server (2.6.18-128.el5PAE)
root (hd0,0)
kernel /vmlinuz-2.6.18-128.el5PAE ro root=/dev/VolGroup00/LogVol00 rhgb
quiet
initrd /initrd-2.6.18-128.el5PAE.img

I ensured that the guest has the virtio_blk.ko module.

rpm -qa | grep kvm:  qemu-kvm-0.10.5-6

Guest OS: RHEL 5.3 32/64

Host OS: SlES11 64

Guest OS Image storage type: nfs


Is there anything that I am not doing in correct way?
Please let me know any other information is required.

Thanks
Sudhir


-- 
Sudhir Kumar
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] eventfd - revised interface and cleanups

2009-06-23 Thread Davide Libenzi

On Tue, 23 Jun 2009, Randy Dunlap wrote:

> kernel-doc syntax requires the function name + short description on one line,
> followed by parameters.  Any longer function description then comes after
> the parameters.
> 
> See Documentation/kernel-doc-nano-HOWTO.txt for more info,
> or ask me.

Will fix, thx!


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/8] kvm/mmu: make direct mapping paths aware of mapping levels

2009-06-23 Thread Joerg Roedel

On Tue, Jun 23, 2009 at 01:47:28PM -0300, Marcelo Tosatti wrote:
> On Fri, Jun 19, 2009 at 03:16:26PM +0200, Joerg Roedel wrote:
> > @@ -254,7 +254,7 @@ static int is_last_spte(u64 pte, int level)
> >  {
> > if (level == PT_PAGE_TABLE_LEVEL)
> > return 1;
> > -   if (level == PT_DIRECTORY_LEVEL && is_large_pte(pte))
> > +   if (is_large_pte(pte))
> > return 1;
> 
> Wouldnt it be safer to check for bit 7 only on the levels
> we're sure it means large page?

If we are not on PT_PAGE_TABLE_LEVEL and this bit does not mean "large page"
then bit 7 is MBZ and harware would fault. So it should be safe to just check
for bit 7 here.


> > kvm_get_pfn(pfn);
> > mmu_set_spte(vcpu, spte, page->role.access, pte_access, 0, 0,
> > -gpte & PT_DIRTY_MASK, NULL, largepage,
> > +gpte & PT_DIRTY_MASK, NULL, level,
> >  gpte_to_gfn(gpte), pfn, true);
> 
> It would be better to just turn off updates to large sptes via  
> update_pte path, so just nuke them and let the pagefault path
> handle.

Yeah true. Thats in the shadow paging patch. Maybe I can get it into a state to
be postable again ;-)

Thanks,

Joerg

-- 
   | Advanced Micro Devices GmbH
 Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München
 System| 
 Research  | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
 Center| Sitz: Dornach, Gemeinde Aschheim, Landkreis München
   | Registergericht München, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] eventfd - revised interface and cleanups

2009-06-23 Thread Davide Libenzi

On Tue, 23 Jun 2009, Avi Kivity wrote:

> On 06/23/2009 07:47 PM, Davide Libenzi wrote:
> > The following patch changes the eventfd interface to de-couple the eventfd
> > memory context, from the file pointer instance.
> > Without such change, there is no clean way to racely free handle the
> > POLLHUP event sent when the last instance of the file* goes away.
> > Also, now the internal eventfd APIs are using the eventfd context instead
> > of the file*.
> > Another cleanup this patch does, is making AIO select EVENTFD, instead of
> > adding a bunch of empty function stubs inside eventfd.h.
> > 
> > Andrew, this better go via Avi and the KVM tree, since they have patches
> > that will be based on the new interface.
> >
> 
> The kvm patches will only be ready for 2.6.32.  Can this go in 2.6.31 now, and
> we'll meet in 10 weeks?

Either ways works for me.
Meanwhile, I'll repost with the revised documentation format.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] QEMU bug tracker on Launchpad

2009-06-23 Thread Dustin Kirkland

On Tue, 2009-06-23 at 20:02 +0300, Blue Swirl wrote:
> On 6/23/09, Anthony Liguori  wrote:
> > Dustin Kirkland was kind enough to setup a bug tracker for QEMU on
> > Launchpad.  I would like to make this the official QEMU bug tracker unless
> > there is significant objection.
> 
> The links on code tab do not show which is our tree and there are some
> Ubuntu trees:
> https://code.launchpad.net/qemu
> 
> Can that be fixed?

This 'code' tab does not yet support git.  The trees you're seeing are
bzr.  I don't think Anthony has any intention of using the bzr functions
of Launchpad, and the qemu git tree will remain at
git://git.savannah.nongnu.org/qemu.git.

For convenience, I have setup a periodic sync of the latest git (using
git fast-export | bzr fast-import -).

I'll move the Ubuntu trees elsewhere, sorry for the noise.

:-Dustin

signature.asc
Description: This is a digitally signed message part

Re: [patch] eventfd - revised interface and cleanups

2009-06-23 Thread Avi Kivity


On 06/23/2009 07:47 PM, Davide Libenzi wrote:

The following patch changes the eventfd interface to de-couple the eventfd
memory context, from the file pointer instance.
Without such change, there is no clean way to racely free handle the
POLLHUP event sent when the last instance of the file* goes away.
Also, now the internal eventfd APIs are using the eventfd context instead
of the file*.
Another cleanup this patch does, is making AIO select EVENTFD, instead of
adding a bunch of empty function stubs inside eventfd.h.

Andrew, this better go via Avi and the KVM tree, since they have patches
that will be based on the new interface.
   


The kvm patches will only be ready for 2.6.32.  Can this go in 2.6.31 
now, and we'll meet in 10 weeks?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] QEMU bug tracker on Launchpad

2009-06-23 Thread Blue Swirl

On 6/23/09, Anthony Liguori  wrote:
> Dustin Kirkland was kind enough to setup a bug tracker for QEMU on
> Launchpad.  I would like to make this the official QEMU bug tracker unless
> there is significant objection.

The links on code tab do not show which is our tree and there are some
Ubuntu trees:
https://code.launchpad.net/qemu

Can that be fixed?

>  There are a number of QEMU/KVM bug trackers mostly distribution centric
> today.  Having a common upstream bug tracker will help immensely in
> coordinating bug fixing effort for those people who are interested in that
> sort of thing :-)
>
>  Here are some of the requirements I had for a bug tracker in no particular
> order:
>
>  1) minimal work required for the QEMU maintainers
>  2) ability to link bugs to external bug trackers
>  3) ability to control bug status via mails
>  4) API for scripting
>
>  The biggest issue for me with Launchpad was that it is not open source
> today.  Canonical is actively working to release the source code though and
> have scheduled a date for release this July.  See the link below for more
> information.
>
>  I've already begun using this bug tracker so you can look through it today
> to get a feeling for what it's like.
>
>  https://bugs.launchpad.net/qemu
>  https://bugs.launchpad.net/qemu/+bugs
>  https://dev.launchpad.net/OpenSourcing
>
>  --
>  Regards,
>
>  Anthony Liguori
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] eventfd - revised interface and cleanups

2009-06-23 Thread Randy Dunlap

Davide Libenzi wrote:
> The following patch changes the eventfd interface to de-couple the eventfd 
> memory context, from the file pointer instance.
> Without such change, there is no clean way to racely free handle the 
> POLLHUP event sent when the last instance of the file* goes away.
> Also, now the internal eventfd APIs are using the eventfd context instead 
> of the file*.
> Another cleanup this patch does, is making AIO select EVENTFD, instead of 
> adding a bunch of empty function stubs inside eventfd.h.
> 
> Andrew, this better go via Avi and the KVM tree, since they have patches 
> that will be based on the new interface.
> 
> 
> 
> Signed-off-by: Davide Libenzi 
> 
> 
> - Davide
> 
> 
> ---
>  drivers/lguest/lg.h  |2 
>  drivers/lguest/lguest_user.c |4 -
>  fs/aio.c |   24 ++
>  fs/eventfd.c |  101 
> ++-
>  include/linux/aio.h  |4 -
>  include/linux/eventfd.h  |   17 ++-
>  init/Kconfig |1 
>  7 files changed, 108 insertions(+), 45 deletions(-)
> 
> Index: linux-2.6.mod/fs/eventfd.c
> ===
> --- linux-2.6.mod.orig/fs/eventfd.c   2009-06-21 16:54:15.0 -0700
> +++ linux-2.6.mod/fs/eventfd.c2009-06-23 09:34:42.0 -0700
> @@ -17,32 +17,38 @@

> -/*
> - * Adds "n" to the eventfd counter "count". Returns "n" in case of
> - * success, or a value lower then "n" in case of coutner overflow.
> - * This function is supposed to be called by the kernel in paths
> - * that do not allow sleeping. In this function we allow the counter
> - * to reach the ULLONG_MAX value, and we signal this as overflow
> - * condition by returining a POLLERR to poll(2).
> +/**
> + * eventfd_signal - Adds @n to the eventfd counter. This function is
> + *  supposed to be called by the kernel in paths that do not
> + *  allow sleeping. In this function we allow the counter
> + *  to reach the ULLONG_MAX value, and we signal this as
> + *  overflow condition by returining a POLLERR to poll(2).
> + *

kernel-doc syntax requires the function name + short description on one line,
followed by parameters.  Any longer function description then comes after
the parameters.

See Documentation/kernel-doc-nano-HOWTO.txt for more info,
or ask me.

> + * @ctx: [in] Pointer to the eventfd context.
> + * @n: [in] Value of the counter to be added to the eventfd internal counter.
> + *
> + * Returns: In case of success, it returns @n, otherwise (in case of overflow
> + *  of the eventfd 64bit internal counter) a value lower than @n.
>   */
> -int eventfd_signal(struct file *file, int n)
> +int eventfd_signal(struct eventfd_ctx *ctx, int n)
>  {
> - struct eventfd_ctx *ctx = file->private_data;
>   unsigned long flags;
>  
>   if (n < 0)

> +/**
> + * eventfd_ctx_put - Releases a reference to the internal eventfd context
> + *   (previously acquired either with eventfd_ctx_get() or
> + *   eventfd_ctx_fdget()).
> + *
> + * @ctx: [in] Pointer to eventfd context.
> + *
> + * Returns: Nothing.
> + */
> +void eventfd_ctx_put(struct eventfd_ctx *ctx)
> +{
> + kref_put(&ctx->kref, eventfd_free);
> +}
> +EXPORT_SYMBOL_GPL(eventfd_ctx_put);
> +
>  static int eventfd_release(struct inode *inode, struct file *file)
>  {
> - kfree(file->private_data);
> + struct eventfd_ctx *ctx = file->private_data;
> +
> + wake_up_poll(&ctx->wqh, POLLHUP);
> + eventfd_ctx_put(ctx);
>   return 0;
>  }

>  
> +/**
> + * eventfd_ctx_fdget - Acquires a reference to the internal eventfd context
> + * given an eventfd file descriptor.
> + *
> + * @fd: [in] Eventfd file descriptor.
> + *
> + * Returns: In case of success, it returns a pointer to the internal eventfd
> + *  context, otherwise a proper error code.
> + */
> +struct eventfd_ctx *eventfd_ctx_fdget(int fd)
> +{
> + struct file *file;
> + struct eventfd_ctx *ctx;
> +
> + file = eventfd_fget(fd);
> + if (IS_ERR(file))
> + return (struct eventfd_ctx *) file;
> + ctx = eventfd_ctx_get(file->private_data);
> + fput(file);
> +
> + return ctx;
> +}
> +EXPORT_SYMBOL_GPL(eventfd_ctx_fdget);


-- 
~Randy
LPC 2009, Sept. 23-25, Portland, Oregon
http://linuxplumbersconf.org/2009/
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/8] kvm/mmu: rename is_largepage_backed to mapping_level

2009-06-23 Thread Joerg Roedel

On Tue, Jun 23, 2009 at 12:59:33PM -0300, Marcelo Tosatti wrote:
> Hi Joerg,
> 
> On Fri, Jun 19, 2009 at 03:16:24PM +0200, Joerg Roedel wrote:
> > gfn = unalias_gfn(kvm, gfn);
> > -   write_count = slot_largepage_idx(gfn,
> > -gfn_to_memslot_unaliased(kvm, gfn));
> > -   *write_count += 1;
> > +
> > +   for (i = PT_DIRECTORY_LEVEL;
> > +i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
> > +   slot  = gfn_to_memslot_unaliased(kvm, gfn);
> 
> Can't you move this call out of the loop?

True. Will do this.

> > @@ -1704,7 +1739,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> > if ((pte_access & ACC_WRITE_MASK)
> > || (write_fault && !is_write_protection(vcpu) && !user_fault)) {
> >  
> > -   if (largepage && has_wrprotected_page(vcpu->kvm, gfn)) {
> > +   if (largepage && has_wrprotected_page(vcpu->kvm, gfn, 1)) {
> 
> It seems direct_map is missing the large pte overwrite check that
> fetch() contains:
> 
> if (is_large_pte(*sptep)) {
> rmap_remove(vcpu->kvm, sptep);
> __set_spte(sptep, shadow_trap_nonpresent_pte);
> kvm_flush_remote_tlbs(vcpu->kvm);
> }
> 
> (perhaps its not a possible scenario at the moment, but...).


This function is only called from mmu_set_spte which takes care of this.

Thanks,

Joerg

-- 
   | Advanced Micro Devices GmbH
 Operating | Karl-Hammerschmidt-Str. 34, 85609 Dornach bei München
 System| 
 Research  | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
 Center| Sitz: Dornach, Gemeinde Aschheim, Landkreis München
   | Registergericht München, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch] eventfd - revised interface and cleanups

2009-06-23 Thread Davide Libenzi

The following patch changes the eventfd interface to de-couple the eventfd 
memory context, from the file pointer instance.
Without such change, there is no clean way to racely free handle the 
POLLHUP event sent when the last instance of the file* goes away.
Also, now the internal eventfd APIs are using the eventfd context instead 
of the file*.
Another cleanup this patch does, is making AIO select EVENTFD, instead of 
adding a bunch of empty function stubs inside eventfd.h.

Andrew, this better go via Avi and the KVM tree, since they have patches 
that will be based on the new interface.



Signed-off-by: Davide Libenzi 


- Davide


---
 drivers/lguest/lg.h  |2 
 drivers/lguest/lguest_user.c |4 -
 fs/aio.c |   24 ++
 fs/eventfd.c |  101 ++-
 include/linux/aio.h  |4 -
 include/linux/eventfd.h  |   17 ++-
 init/Kconfig |1 
 7 files changed, 108 insertions(+), 45 deletions(-)

Index: linux-2.6.mod/fs/eventfd.c
===
--- linux-2.6.mod.orig/fs/eventfd.c 2009-06-21 16:54:15.0 -0700
+++ linux-2.6.mod/fs/eventfd.c  2009-06-23 09:34:42.0 -0700
@@ -17,32 +17,38 @@
 #include 
 #include 
 #include 
+#include 
 
 struct eventfd_ctx {
+   struct kref kref;
wait_queue_head_t wqh;
/*
 * Every time that a write(2) is performed on an eventfd, the
 * value of the __u64 being written is added to "count" and a
 * wakeup is performed on "wqh". A read(2) will return the "count"
 * value to userspace, and will reset "count" to zero. The kernel
-* size eventfd_signal() also, adds to the "count" counter and
+* side eventfd_signal() also, adds to the "count" counter and
 * issue a wakeup.
 */
__u64 count;
unsigned int flags;
 };
 
-/*
- * Adds "n" to the eventfd counter "count". Returns "n" in case of
- * success, or a value lower then "n" in case of coutner overflow.
- * This function is supposed to be called by the kernel in paths
- * that do not allow sleeping. In this function we allow the counter
- * to reach the ULLONG_MAX value, and we signal this as overflow
- * condition by returining a POLLERR to poll(2).
+/**
+ * eventfd_signal - Adds @n to the eventfd counter. This function is
+ *  supposed to be called by the kernel in paths that do not
+ *  allow sleeping. In this function we allow the counter
+ *  to reach the ULLONG_MAX value, and we signal this as
+ *  overflow condition by returining a POLLERR to poll(2).
+ *
+ * @ctx: [in] Pointer to the eventfd context.
+ * @n: [in] Value of the counter to be added to the eventfd internal counter.
+ *
+ * Returns: In case of success, it returns @n, otherwise (in case of overflow
+ *  of the eventfd 64bit internal counter) a value lower than @n.
  */
-int eventfd_signal(struct file *file, int n)
+int eventfd_signal(struct eventfd_ctx *ctx, int n)
 {
-   struct eventfd_ctx *ctx = file->private_data;
unsigned long flags;
 
if (n < 0)
@@ -59,9 +65,49 @@ int eventfd_signal(struct file *file, in
 }
 EXPORT_SYMBOL_GPL(eventfd_signal);
 
+static void eventfd_free(struct kref *kref)
+{
+   struct eventfd_ctx *ctx = container_of(kref, struct eventfd_ctx, kref);
+
+   kfree(ctx);
+}
+
+/**
+ * eventfd_ctx_get - Acquires a reference to the internal eventfd context.
+ *
+ * @ctx: [in] Pointer to the eventfd context.
+ *
+ * Returns: In case of success, returns a pointer to the eventfd context,
+ *  otherwise a proper error code.
+ */
+struct eventfd_ctx *eventfd_ctx_get(struct eventfd_ctx *ctx)
+{
+   kref_get(&ctx->kref);
+   return ctx;
+}
+EXPORT_SYMBOL_GPL(eventfd_ctx_get);
+
+/**
+ * eventfd_ctx_put - Releases a reference to the internal eventfd context
+ *   (previously acquired either with eventfd_ctx_get() or
+ *   eventfd_ctx_fdget()).
+ *
+ * @ctx: [in] Pointer to eventfd context.
+ *
+ * Returns: Nothing.
+ */
+void eventfd_ctx_put(struct eventfd_ctx *ctx)
+{
+   kref_put(&ctx->kref, eventfd_free);
+}
+EXPORT_SYMBOL_GPL(eventfd_ctx_put);
+
 static int eventfd_release(struct inode *inode, struct file *file)
 {
-   kfree(file->private_data);
+   struct eventfd_ctx *ctx = file->private_data;
+
+   wake_up_poll(&ctx->wqh, POLLHUP);
+   eventfd_ctx_put(ctx);
return 0;
 }
 
@@ -185,6 +231,14 @@ static const struct file_operations even
.write  = eventfd_write,
 };
 
+/**
+ * eventfd_fget - Acquire a reference of an eventfd file descriptor.
+ *
+ * @fd: [in] Eventfd file descriptor.
+ *
+ * Returns: A pointer to the eventfd file structure in case of success, or a
+ *  proper error pointer in case of failure.
+ */
 struct file *eventfd_fget(int fd)
 {
struct file *fil

Re: [PATCH 5/8] kvm/mmu: make direct mapping paths aware of mapping levels

2009-06-23 Thread Marcelo Tosatti

On Fri, Jun 19, 2009 at 03:16:26PM +0200, Joerg Roedel wrote:
> Signed-off-by: Joerg Roedel 
> ---
>  arch/x86/include/asm/kvm_host.h |2 +-
>  arch/x86/kvm/mmu.c  |   60 
> ++-
>  arch/x86/kvm/paging_tmpl.h  |6 ++--
>  3 files changed, 38 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 930bac2..397f992 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -314,7 +314,7 @@ struct kvm_vcpu_arch {
>   struct {
>   gfn_t gfn;  /* presumed gfn during guest pte update */
>   pfn_t pfn;  /* pfn corresponding to that gfn */
> - int largepage;
> + int level;
>   unsigned long mmu_seq;
>   } update_pte;
>  
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 0ef947d..ef2396d 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -254,7 +254,7 @@ static int is_last_spte(u64 pte, int level)
>  {
>   if (level == PT_PAGE_TABLE_LEVEL)
>   return 1;
> - if (level == PT_DIRECTORY_LEVEL && is_large_pte(pte))
> + if (is_large_pte(pte))
>   return 1;

Wouldnt it be safer to check for bit 7 only on the levels
we're sure it means large page?

>   return 0;
>  }
> @@ -1708,7 +1708,7 @@ static int mmu_need_write_protect(struct kvm_vcpu 
> *vcpu, gfn_t gfn,
>  
>  static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>   unsigned pte_access, int user_fault,
> - int write_fault, int dirty, int largepage,
> + int write_fault, int dirty, int level,
>   gfn_t gfn, pfn_t pfn, bool speculative,
>   bool can_unsync)
>  {
> @@ -1731,7 +1731,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>   spte |= shadow_nx_mask;
>   if (pte_access & ACC_USER_MASK)
>   spte |= shadow_user_mask;
> - if (largepage)
> + if (level > PT_PAGE_TABLE_LEVEL)
>   spte |= PT_PAGE_SIZE_MASK;
>   if (tdp_enabled)
>   spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
> @@ -1742,7 +1742,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>   if ((pte_access & ACC_WRITE_MASK)
>   || (write_fault && !is_write_protection(vcpu) && !user_fault)) {
>  
> - if (largepage && has_wrprotected_page(vcpu->kvm, gfn, 1)) {
> + if (level > PT_PAGE_TABLE_LEVEL &&
> + has_wrprotected_page(vcpu->kvm, gfn, level)) {
>   ret = 1;
>   spte = shadow_trap_nonpresent_pte;
>   goto set_pte;
> @@ -1780,7 +1781,7 @@ set_pte:
>  static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>unsigned pt_access, unsigned pte_access,
>int user_fault, int write_fault, int dirty,
> -  int *ptwrite, int largepage, gfn_t gfn,
> +  int *ptwrite, int level, gfn_t gfn,
>pfn_t pfn, bool speculative)
>  {
>   int was_rmapped = 0;
> @@ -1796,7 +1797,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
> *sptep,
>* If we overwrite a PTE page pointer with a 2MB PMD, unlink
>* the parent of the now unreachable PTE.
>*/
> - if (largepage && !is_large_pte(*sptep)) {
> + if (level > PT_PAGE_TABLE_LEVEL &&
> + !is_large_pte(*sptep)) {

Should probably get rid of this entirely (or BUG_ON), since a better
place to handle this is at fetch / direct_map as mentioned in the other
email.

But we can do that later, incrementally.

>   struct kvm_mmu_page *child;
>   u64 pte = *sptep;
>  
> @@ -1809,8 +1811,9 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
> *sptep,
>   } else
>   was_rmapped = 1;
>   }
> +
>   if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault,
> -   dirty, largepage, gfn, pfn, speculative, true)) {
> +   dirty, level, gfn, pfn, speculative, true)) {
>   if (write_fault)
>   *ptwrite = 1;
>   kvm_x86_ops->tlb_flush(vcpu);
> @@ -1846,7 +1849,7 @@ static void nonpaging_new_cr3(struct kvm_vcpu *vcpu)
>  }
>  
>  static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
> - int largepage, gfn_t gfn, pfn_t pfn)
> + int level, gfn_t gfn, pfn_t pfn)
>  {
>   struct kvm_shadow_walk_iterator iterator;
>   struct kvm_mmu_page *sp;
> @@ -1854,11 +1857,10 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t 
> v, int write,
>   gfn_t pseudo_gfn;
>  
>   for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
> - if (iterator.level == PT_PAGE_TABLE_LEVEL
> - || (largepag

Re: [PATCH 2/8] kvm: change memslot data structures for multiple hugepage sizes

2009-06-23 Thread Marcelo Tosatti

On Fri, Jun 19, 2009 at 03:16:23PM +0200, Joerg Roedel wrote:
>  /*
> @@ -724,11 +724,11 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned 
> long hva,
>   end = start + (memslot->npages << PAGE_SHIFT);
>   if (hva >= start && hva < end) {
>   gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
> + int idx = gfn_offset /
> +   KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL);
>   retval |= handler(kvm, &memslot->rmap[gfn_offset]);
>   retval |= handler(kvm,
> -   &memslot->lpage_info[
> -   gfn_offset /
> -   
> KVM_PAGES_PER_HPAGE].rmap_pde);
> + &memslot->lpage_info[0][idx].rmap_pde);
>   }

Cannot find where you update this function to reflect 1GB pages in the
series?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/8] kvm/mmu: rename is_largepage_backed to mapping_level

2009-06-23 Thread Marcelo Tosatti

Hi Joerg,

On Fri, Jun 19, 2009 at 03:16:24PM +0200, Joerg Roedel wrote:
> With the new name and the corresponding backend changes this function
> can now support multiple hugepage sizes.
> 
> Signed-off-by: Joerg Roedel 
> ---
>  arch/x86/kvm/mmu.c |  100 +--
>  arch/x86/kvm/paging_tmpl.h |4 +-
>  2 files changed, 69 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 1f24d88..3fa6009 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -390,37 +390,52 @@ static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd)
>   * Return the pointer to the largepage write count for a given
>   * gfn, handling slots that are not large page aligned.
>   */
> -static int *slot_largepage_idx(gfn_t gfn, struct kvm_memory_slot *slot)
> +static int *slot_largepage_idx(gfn_t gfn,
> +struct kvm_memory_slot *slot,
> +int level)
>  {
>   unsigned long idx;
>  
> - idx = (gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL)) -
> -   (slot->base_gfn / KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL));
> - return &slot->lpage_info[0][idx].write_count;
> + idx = (gfn / KVM_PAGES_PER_HPAGE(level)) -
> +   (slot->base_gfn / KVM_PAGES_PER_HPAGE(level));
> + return &slot->lpage_info[level - 2][idx].write_count;
>  }
>  
>  static void account_shadowed(struct kvm *kvm, gfn_t gfn)
>  {
> + struct kvm_memory_slot *slot;
>   int *write_count;
> + int i;
>  
>   gfn = unalias_gfn(kvm, gfn);
> - write_count = slot_largepage_idx(gfn,
> -  gfn_to_memslot_unaliased(kvm, gfn));
> - *write_count += 1;
> +
> + for (i = PT_DIRECTORY_LEVEL;
> +  i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
> + slot  = gfn_to_memslot_unaliased(kvm, gfn);

Can't you move this call out of the loop?

> + write_count   = slot_largepage_idx(gfn, slot, i);
> + *write_count += 1;
> + }
>  }
>  
>  static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn)
>  {
> + struct kvm_memory_slot *slot;
>   int *write_count;
> + int i;
>  
>   gfn = unalias_gfn(kvm, gfn);
> - write_count = slot_largepage_idx(gfn,
> -  gfn_to_memslot_unaliased(kvm, gfn));
> - *write_count -= 1;
> - WARN_ON(*write_count < 0);
> + for (i = PT_DIRECTORY_LEVEL;
> +  i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
> + slot  = gfn_to_memslot_unaliased(kvm, gfn);
> + write_count   = slot_largepage_idx(gfn, slot, i);
> + *write_count -= 1;
> + WARN_ON(*write_count < 0);
> + }
>  }
>  
> -static int has_wrprotected_page(struct kvm *kvm, gfn_t gfn)
> +static int has_wrprotected_page(struct kvm *kvm,
> + gfn_t gfn,
> + int level)
>  {
>   struct kvm_memory_slot *slot;
>   int *largepage_idx;
> @@ -428,47 +443,67 @@ static int has_wrprotected_page(struct kvm *kvm, gfn_t 
> gfn)
>   gfn = unalias_gfn(kvm, gfn);
>   slot = gfn_to_memslot_unaliased(kvm, gfn);
>   if (slot) {
> - largepage_idx = slot_largepage_idx(gfn, slot);
> + largepage_idx = slot_largepage_idx(gfn, slot, level);
>   return *largepage_idx;
>   }
>  
>   return 1;
>  }
>  
> -static int host_largepage_backed(struct kvm *kvm, gfn_t gfn)
> +static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
>  {
> + unsigned long page_size = PAGE_SIZE;
>   struct vm_area_struct *vma;
>   unsigned long addr;
> - int ret = 0;
> + int i, ret = 0;
>  
>   addr = gfn_to_hva(kvm, gfn);
>   if (kvm_is_error_hva(addr))
> - return ret;
> + return page_size;
>  
>   down_read(¤t->mm->mmap_sem);
>   vma = find_vma(current->mm, addr);
> - if (vma && is_vm_hugetlb_page(vma))
> - ret = 1;
> + if (!vma)
> + goto out;
> +
> + page_size = vma_kernel_pagesize(vma);
> +
> +out:
>   up_read(¤t->mm->mmap_sem);
>  
> + for (i = PT_PAGE_TABLE_LEVEL;
> +  i < (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES); ++i) {
> + if (page_size >= KVM_HPAGE_SIZE(i))
> + ret = i;
> + else
> + break;
> + }
> +
>   return ret;
>  }
>  
> -static int is_largepage_backed(struct kvm_vcpu *vcpu, gfn_t large_gfn)
> +static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
>  {
>   struct kvm_memory_slot *slot;
> -
> - if (has_wrprotected_page(vcpu->kvm, large_gfn))
> - return 0;
> -
> - if (!host_largepage_backed(vcpu->kvm, large_gfn))
> - return 0;
> + int host_level;
> + int level = PT_PAGE_TABLE_LEVEL;
>  
>   slot = gfn_to_memslot(vcpu->kvm, large_gfn);
>   if (slot && slot->dirty_bitmap)
> - re

Re: [PATCH] kvm: remove in_range from kvm_io_device

2009-06-23 Thread Gregory Haskins

Michael S. Tsirkin wrote:
> On Tue, Jun 23, 2009 at 11:44:57AM -0400, Gregory Haskins wrote:
>   
  This proposed approach forces us into a
 potential O(256) algorithm in the hotpath (all MMIO/PIO exits will hit
 this, not just in-kernel users).  How would you address this?
 
 
>>> Two ideas that come to mind:
>>> - add addr/len fields to devices, use these to speed up lookup
>>>   
>>>   
>> Yep, thats what I was thinking as well.  We can have the top-level
>> (group) be an rbtree on addr/len, and then walk the list of items at
>> that address linearly using your read/write() approach.
>>
>>
>> 
>>> - add a small cache that can be scanned first
>>>   
>>>   
>> Yep, I think we may want to do this anyway independent of the search alg.
>>
>> 
>>> In both cases, you first do a fast lookup, ask the device whether
>>> it wants the transaction, then resort to linear scan if not
>>>   
>>>   
>> -Greg
>>
>> 
>
> Looks like we have a concensus then.
>
>   

Yep.  I'll try to go over the patch in detail today and provide any more
feedback.

Thanks Michael,
-Greg



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] kvm: remove in_range from kvm_io_device

2009-06-23 Thread Michael S. Tsirkin

On Tue, Jun 23, 2009 at 11:44:57AM -0400, Gregory Haskins wrote:
> >>  This proposed approach forces us into a
> >> potential O(256) algorithm in the hotpath (all MMIO/PIO exits will hit
> >> this, not just in-kernel users).  How would you address this?
> >> 
> >
> > Two ideas that come to mind:
> > - add addr/len fields to devices, use these to speed up lookup
> >   
> 
> Yep, thats what I was thinking as well.  We can have the top-level
> (group) be an rbtree on addr/len, and then walk the list of items at
> that address linearly using your read/write() approach.
> 
> 
> > - add a small cache that can be scanned first
> >   
> 
> Yep, I think we may want to do this anyway independent of the search alg.
> 
> > In both cases, you first do a fast lookup, ask the device whether
> > it wants the transaction, then resort to linear scan if not
> >   
> 
> -Greg
> 

Looks like we have a concensus then.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Planning for the 0.11.0 release

2009-06-23 Thread Avi Kivity


On 06/23/2009 06:09 PM, Blue Swirl wrote:

I think this is great, but OpenBIOS still uses Subversion. Can git use
SVN submodules for example?
   


No, but we could have a git svn mirror and include that.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Planning for the 0.11.0 release

2009-06-23 Thread Anthony Liguori


Blue Swirl wrote:

On 6/23/09, Anthony Liguori  wrote:
  

Hi,

 It's getting to be about the time to start thinking about the 0.11.0
release.  0.10.0 was released on March 2nd so following with the 6 month
release cycle, that would put 0.11.0 at September 2nd.

 Based on the experiences with the stable releases, here's what I'd
recommend:

 o On July 15th, fork master -> stable-0.11
 o Change version to 0.10.90
 o Release qemu-0.11.0-rc1
 o Release additional -rcN releases every 1-2 weeks
 o Introduce a new maintainer for stable-0.10 (via git pulls)
 o At least 1 week before release, hopefully we'll have the final -rcN that
we can then declare 0.11.0.



Sounds OK. I think OpenBIOS releases should follow similar schedule,
maybe even with matching SVN tags (1.1-rc1 for 0.11.0-rc1 etc).

  

 I think we should really try hard to make these dates.  I only have a few
things that I would like to see happen before forking stable-0.11.  Namely:

 o Setup qemu.org infrastructure (git hosting, wiki)
 o Setup qemu bug tracker (see next mail)
 o Include all ROM source code in tree via git submodules.  This is a major
headache for distributors and I think it's important to resolve before our
next release.



I think this is great, but OpenBIOS still uses Subversion. Can git use
SVN submodules for example?
  

No, but it's not too hard to make svn->git bridges.

Regards,

Anthony Liguori


--
Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: remove in_range from kvm_io_device

2009-06-23 Thread Gregory Haskins

Michael S. Tsirkin wrote:
> On Tue, Jun 23, 2009 at 11:21:53AM -0400, Gregory Haskins wrote:
>   
>> Michael S. Tsirkin wrote:
>> 
>>> Remove in_range from kvm_io_device and ask read/write callbacks, if
>>> supplied, to perform range checks internally. This allows aliasing
>>> (mostly for in-kernel virtio), as well as better error handling by
>>> making it possible to pass errors up to userspace. And it's enough to
>>> look at the diffstat to see that it's a better API anyway.
>>>
>>> While we are at it, document locking rules for kvm_io_device.
>>>   
>>>   
>> Sorry, not trying to be a PITA, but I liked your last suggestion better.  :(
>>
>> I am thinking forward to when we want to use something smarter than a
>> linear search (like rbtree/radix) for scaling the number of "devices"
>> (really, virtio-rings) that we support.
>> 
>
> in_range is broken for this anyway: you need more than a boolean
> predicate to implement rbtree/radix
>   

Yes, understood..in_range() needs to be (pardon the pun) "addressed"
;).  But getting rid of in_range() and moving the match logic into the
read()/write() verbs is potentially a step in the wrong direction if we
ever wanted to go that route.  And I'm pretty sure we do.

>   
>> The current device-count
>> target is 512, which we will begin to rapidly consume as the in-kernel
>> virtio work progresses.
>> 
>
> That's a large number. I had in mind more like 4 virtio devices, for
> starters: 1 for each virtqueue in net and block.
>   

Thats way to low.  For instance, I'll be wanting to do things like
802.1p which would be 16 virtio-rings per device (8 prio levels tx, 8
levels rx).  And thats just for one device.  I think Avi came up with an
estimate of supporting 20 devices @ 16 queues = 320, so we rounded it to
512.
>   
>>  This proposed approach forces us into a
>> potential O(256) algorithm in the hotpath (all MMIO/PIO exits will hit
>> this, not just in-kernel users).  How would you address this?
>> 
>
> Two ideas that come to mind:
> - add addr/len fields to devices, use these to speed up lookup
>   

Yep, thats what I was thinking as well.  We can have the top-level
(group) be an rbtree on addr/len, and then walk the list of items at
that address linearly using your read/write() approach.


> - add a small cache that can be scanned first
>   

Yep, I think we may want to do this anyway independent of the search alg.

> In both cases, you first do a fast lookup, ask the device whether
> it wants the transaction, then resort to linear scan if not
>   

-Greg





signature.asc
Description: OpenPGP digital signature

Re: [PATCH] kvm: remove in_range from kvm_io_device

2009-06-23 Thread Michael S. Tsirkin

On Tue, Jun 23, 2009 at 11:21:53AM -0400, Gregory Haskins wrote:
> Michael S. Tsirkin wrote:
> > Remove in_range from kvm_io_device and ask read/write callbacks, if
> > supplied, to perform range checks internally. This allows aliasing
> > (mostly for in-kernel virtio), as well as better error handling by
> > making it possible to pass errors up to userspace. And it's enough to
> > look at the diffstat to see that it's a better API anyway.
> >
> > While we are at it, document locking rules for kvm_io_device.
> >   
> 
> Sorry, not trying to be a PITA, but I liked your last suggestion better.  :(
> 
> I am thinking forward to when we want to use something smarter than a
> linear search (like rbtree/radix) for scaling the number of "devices"
> (really, virtio-rings) that we support.

in_range is broken for this anyway: you need more than a boolean
predicate to implement rbtree/radix

> The current device-count
> target is 512, which we will begin to rapidly consume as the in-kernel
> virtio work progresses.

That's a large number. I had in mind more like 4 virtio devices, for
starters: 1 for each virtqueue in net and block.

>  This proposed approach forces us into a
> potential O(256) algorithm in the hotpath (all MMIO/PIO exits will hit
> this, not just in-kernel users).  How would you address this?

Two ideas that come to mind:
- add addr/len fields to devices, use these to speed up lookup
- add a small cache that can be scanned first

In both cases, you first do a fast lookup, ask the device whether
it wants the transaction, then resort to linear scan if not

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 0/3] fixes for kvm on s390

2009-06-23 Thread Christian Borntraeger

Hello Avi,

here are three patches against kvm.git that fix several issues in our
kvm port. Please review and consider all patches for 2.6.31.

Christian

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 3/3] kvm-s390: Remove some unused variables

2009-06-23 Thread Christian Borntraeger

From: Christian Borntraeger 

This patch fixes the following warnings that were introduced by
commit 2921292f45733bccdb53e426bcf65ceb13f53d94
Author: Gleb Natapov 
KVM: Use macro to iterate over vcpus.

arch/s390/kvm/kvm-s390.c: In function 'kvm_arch_set_memory_region':
arch/s390/kvm/kvm-s390.c:687: warning: unused variable 'r'
arch/s390/kvm/kvm-s390.c:687: warning: unused variable 'j'

CC: Gleb Natapov 
Signed-off-by: Christian Borntraeger 
---
 arch/s390/kvm/kvm-s390.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: kvm/arch/s390/kvm/kvm-s390.c
===
--- kvm.orig/arch/s390/kvm/kvm-s390.c
+++ kvm/arch/s390/kvm/kvm-s390.c
@@ -684,7 +684,7 @@ int kvm_arch_set_memory_region(struct kv
struct kvm_memory_slot old,
int user_alloc)
 {
-   int i, j = 0, r = -EINVAL;
+   int i;
struct kvm_vcpu *vcpu;
 
/* A few sanity checks. We can have exactly one memory slot which has

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 2/3] kvm-s390: Allow stfle instruction in the guest

2009-06-23 Thread Christian Borntraeger

From: Christian Borntraeger 

2.6.31-rc introduced an architecture level set checker based on facility
bits. e.g. if the kernel is compiled to run only on z9, several facility 
bits are checked very early and the kernel refuses to boot if a z9 specific
facility is missing.
Until now kvm on s390 did not implement the store facility extended (STFLE)
instruction. A 2.6.31-rc kernel that was compiled for z9 or higher did not
boot in kvm. This patch implements stfle.

This patch should go in before 2.6.31.

Signed-off-by: Christian Borntraeger 
---
 arch/s390/include/asm/kvm_host.h |4 +++-
 arch/s390/kvm/kvm-s390.c |   23 ++-
 arch/s390/kvm/priv.c |2 +-
 3 files changed, 26 insertions(+), 3 deletions(-)

Index: kvm/arch/s390/include/asm/kvm_host.h
===
--- kvm.orig/arch/s390/include/asm/kvm_host.h
+++ kvm/arch/s390/include/asm/kvm_host.h
@@ -99,7 +99,9 @@ struct kvm_s390_sie_block {
__u8reservedd0[48]; /* 0x00d0 */
__u64   gcr[16];/* 0x0100 */
__u64   gbea;   /* 0x0180 */
-   __u8reserved188[120];   /* 0x0188 */
+   __u8reserved188[24];/* 0x0188 */
+   __u32   fac;/* 0x01a0 */
+   __u8reserved1a4[92];/* 0x01a4 */
 } __attribute__((packed));
 
 struct kvm_vcpu_stat {
Index: kvm/arch/s390/kvm/kvm-s390.c
===
--- kvm.orig/arch/s390/kvm/kvm-s390.c
+++ kvm/arch/s390/kvm/kvm-s390.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "kvm-s390.h"
 #include "gaccess.h"
 
@@ -70,6 +71,7 @@ struct kvm_stats_debugfs_item debugfs_en
{ NULL }
 };
 
+static unsigned long long *facilities;
 
 /* Section: not file related */
 void kvm_arch_hardware_enable(void *garbage)
@@ -287,6 +289,7 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu 
set_bit(KVM_REQ_MMU_RELOAD, &vcpu->requests);
vcpu->arch.sie_block->ecb   = 2;
vcpu->arch.sie_block->eca   = 0xC1002001U;
+   vcpu->arch.sie_block->fac   = (int) (long) facilities;
hrtimer_init(&vcpu->arch.ckc_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
tasklet_init(&vcpu->arch.tasklet, kvm_s390_tasklet,
 (unsigned long) vcpu);
@@ -727,11 +730,29 @@ gfn_t unalias_gfn(struct kvm *kvm, gfn_t
 
 static int __init kvm_s390_init(void)
 {
-   return kvm_init(NULL, sizeof(struct kvm_vcpu), THIS_MODULE);
+   int ret;
+   ret = kvm_init(NULL, sizeof(struct kvm_vcpu), THIS_MODULE);
+   if (ret)
+   return ret;
+
+   /*
+* guests can ask for up to 255+1 double words, we need a full page
+* to hold the maximum amount of facilites. On the other hand, we
+* only set facilities that are known to work in KVM.
+*/
+   facilities = (unsigned long long *) get_zeroed_page(GFP_DMA);
+   if (!facilities) {
+   kvm_exit();
+   return -ENOMEM;
+   }
+   stfle(facilities, 1);
+   facilities[0] &= 0xff00fff3f070ULL;
+   return 0;
 }
 
 static void __exit kvm_s390_exit(void)
 {
+   free_page((unsigned long) facilities);
kvm_exit();
 }
 
Index: kvm/arch/s390/kvm/priv.c
===
--- kvm.orig/arch/s390/kvm/priv.c
+++ kvm/arch/s390/kvm/priv.c
@@ -158,7 +158,7 @@ static int handle_stfl(struct kvm_vcpu *
 
vcpu->stat.instruction_stfl++;
/* only pass the facility bits, which we can handle */
-   facility_list &= 0xfe00fff3;
+   facility_list &= 0xff00fff3;
 
rc = copy_to_guest(vcpu, offsetof(struct _lowcore, stfl_fac_list),
   &facility_list, sizeof(facility_list));

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 1/3] kvm-s390: Fix memslot initialization for userspace_addr != 0

2009-06-23 Thread Christian Borntraeger

From: Christian Borntraeger 

Since
commit 854b5338196b1175706e99d63be43a4f8d8ab607
Author: Christian Ehrhardt 
KVM: s390: streamline memslot handling

s390 uses the values of the memslot instead of doing everything in the arch
ioctl handler of the KVM_SET_USER_MEMORY_REGION. Unfortunately we missed to
set the userspace_addr of our memslot due to our s390 ifdef in 
__kvm_set_memory_region.
Old s390 userspace launchers did not notice, since they started the guest at
userspace address 0. 
Because of CONFIG_DEFAULT_MMAP_MIN_ADDR we now put the guest at 1M userspace,
which does not work. This patch makes sure that new.userspace_addr is set
on s390.
This fix should go in quickly. Nevertheless, looking at the code we should
clean up that ifdef in the long term. Any kernel janitors?

Signed-off-by: Christian Borntraeger 
---
 virt/kvm/kvm_main.c |4 
 1 file changed, 4 insertions(+)

Index: kvm/virt/kvm/kvm_main.c
===
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -1199,6 +1199,10 @@ int __kvm_set_memory_region(struct kvm *
if (old.npages)
kvm_arch_flush_shadow(kvm);
}
+#else  /* not defined CONFIG_S390 */
+   new.user_alloc = user_alloc;
+   if (user_alloc)
+   new.userspace_addr = mem->userspace_addr;
 #endif /* not defined CONFIG_S390 */
 
if (!npages)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: remove in_range from kvm_io_device

2009-06-23 Thread Gregory Haskins

Michael S. Tsirkin wrote:
> Remove in_range from kvm_io_device and ask read/write callbacks, if
> supplied, to perform range checks internally. This allows aliasing
> (mostly for in-kernel virtio), as well as better error handling by
> making it possible to pass errors up to userspace. And it's enough to
> look at the diffstat to see that it's a better API anyway.
>
> While we are at it, document locking rules for kvm_io_device.
>   

Sorry, not trying to be a PITA, but I liked your last suggestion better.  :(

I am thinking forward to when we want to use something smarter than a
linear search (like rbtree/radix) for scaling the number of "devices"
(really, virtio-rings) that we support.   The current device-count
target is 512, which we will begin to rapidly consume as the in-kernel
virtio work progresses.  This proposed approach forces us into a
potential O(256) algorithm in the hotpath (all MMIO/PIO exits will hit
this, not just in-kernel users).  How would you address this?

Regards,
-Greg

> Signed-off-by: Michael S. Tsirkin 
> ---
>
> This is a replacement for my patch extending in_range.
> This works for me, but please review carefully.
>
>  arch/ia64/kvm/kvm-ia64.c  |   28 -
>  arch/x86/kvm/i8254.c  |   49 --
>  arch/x86/kvm/i8259.c  |   22 ++
>  arch/x86/kvm/lapic.c  |   48 ++
>  arch/x86/kvm/x86.c|  100 
> +++--
>  include/linux/kvm_host.h  |6 ++-
>  virt/kvm/coalesced_mmio.c |   26 +---
>  virt/kvm/ioapic.c |   24 ++-
>  virt/kvm/iodev.h  |   34 ++-
>  virt/kvm/kvm_main.c   |   25 +++-
>  10 files changed, 158 insertions(+), 204 deletions(-)
>
> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> index d20a5db..647ff91 100644
> --- a/arch/ia64/kvm/kvm-ia64.c
> +++ b/arch/ia64/kvm/kvm-ia64.c
> @@ -198,16 +198,6 @@ int kvm_dev_ioctl_check_extension(long ext)
>  
>  }
>  
> -static struct kvm_io_device *vcpu_find_mmio_dev(struct kvm_vcpu *vcpu,
> - gpa_t addr, int len, int is_write)
> -{
> - struct kvm_io_device *dev;
> -
> - dev = kvm_io_bus_find_dev(&vcpu->kvm->mmio_bus, addr, len, is_write);
> -
> - return dev;
> -}
> -
>  static int handle_vm_error(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
>  {
>   kvm_run->exit_reason = KVM_EXIT_UNKNOWN;
> @@ -219,6 +209,7 @@ static int handle_mmio(struct kvm_vcpu *vcpu, struct 
> kvm_run *kvm_run)
>  {
>   struct kvm_mmio_req *p;
>   struct kvm_io_device *mmio_dev;
> + int r;
>  
>   p = kvm_get_vcpu_ioreq(vcpu);
>  
> @@ -235,16 +226,13 @@ static int handle_mmio(struct kvm_vcpu *vcpu, struct 
> kvm_run *kvm_run)
>   kvm_run->exit_reason = KVM_EXIT_MMIO;
>   return 0;
>  mmio:
> - mmio_dev = vcpu_find_mmio_dev(vcpu, p->addr, p->size, !p->dir);
> - if (mmio_dev) {
> - if (!p->dir)
> - kvm_iodevice_write(mmio_dev, p->addr, p->size,
> - &p->data);
> - else
> - kvm_iodevice_read(mmio_dev, p->addr, p->size,
> - &p->data);
> -
> - } else
> + if (p->dir)
> + r = kvm_io_bus_read(&vcpu->kvm->mmio_bus, p->addr, p->size,
> + &p->data);
> + else
> + r = kvm_io_bus_write(&vcpu->kvm->mmio_bus, p->addr, p->size,
> +  &p->data);
> + if (r)
>   printk(KERN_ERR"kvm: No iodevice found! addr:%lx\n", p->addr);
>   p->state = STATE_IORESP_READY;
>  
> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> index c13bb92..d5881a4 100644
> --- a/arch/x86/kvm/i8254.c
> +++ b/arch/x86/kvm/i8254.c
> @@ -336,8 +336,14 @@ void kvm_pit_load_count(struct kvm *kvm, int channel, 
> u32 val)
>   mutex_unlock(&kvm->arch.vpit->pit_state.lock);
>  }
>  
> -static void pit_ioport_write(struct kvm_io_device *this,
> -  gpa_t addr, int len, const void *data)
> +static inline int pit_in_range(gpa_t addr)
> +{
> + return ((addr >= KVM_PIT_BASE_ADDRESS) &&
> + (addr < KVM_PIT_BASE_ADDRESS + KVM_PIT_MEM_LENGTH));
> +}
> +
> +static int pit_ioport_write(struct kvm_io_device *this,
> + gpa_t addr, int len, const void *data)
>  {
>   struct kvm_pit *pit = (struct kvm_pit *)this->private;
>   struct kvm_kpit_state *pit_state = &pit->pit_state;
> @@ -345,6 +351,8 @@ static void pit_ioport_write(struct kvm_io_device *this,
>   int channel, access;
>   struct kvm_kpit_channel_state *s;
>   u32 val = *(u32 *) data;
> + if (!pit_in_range(addr))
> + return -EOPNOTSUPP;
>  
>   val  &= 0xff;
>   addr &= KVM_PIT_CHANNEL_MASK;
> @@ -407,16 +415,19 @@ static void pit_ioport_write(struct kvm_io_device *this,
>   }
>  
>   mutex_unlock(&pit_state->loc

Re: [Qemu-devel] [PATCH] virtio-serial: virtio device for simple host <-> guest communication

2009-06-23 Thread Blue Swirl

On 6/23/09, Amit Shah  wrote:
> This interface presents a char device from which bits can be
>  sent and read.

>  +struct virtio_serial_config
>  +{
>  +uint32_t nr_ports;
>  +uint16_t status;
>  +} __attribute__((packed));

Obviously this has to match the kernel structure if you go for 16 bit nr_ports.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] virtio_serial: A char device for simple guest <-> host communication

2009-06-23 Thread Blue Swirl

On 6/23/09, Amit Shah  wrote:
> We expose multiple char devices ("ports") for simple communication
>  between the host userspace and guest.
>  +struct virtio_serial_config {
>  +   __u32 nr_ports;
>  +   __u16 status;
>  +} __attribute__((packed));

There is still structure packing. I'd use __u16 for both fields, do
you really need 4 gigs of ports?.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Planning for the 0.11.0 release

2009-06-23 Thread Blue Swirl

On 6/23/09, Anthony Liguori  wrote:
> Hi,
>
>  It's getting to be about the time to start thinking about the 0.11.0
> release.  0.10.0 was released on March 2nd so following with the 6 month
> release cycle, that would put 0.11.0 at September 2nd.
>
>  Based on the experiences with the stable releases, here's what I'd
> recommend:
>
>  o On July 15th, fork master -> stable-0.11
>  o Change version to 0.10.90
>  o Release qemu-0.11.0-rc1
>  o Release additional -rcN releases every 1-2 weeks
>  o Introduce a new maintainer for stable-0.10 (via git pulls)
>  o At least 1 week before release, hopefully we'll have the final -rcN that
> we can then declare 0.11.0.

Sounds OK. I think OpenBIOS releases should follow similar schedule,
maybe even with matching SVN tags (1.1-rc1 for 0.11.0-rc1 etc).

>  I think we should really try hard to make these dates.  I only have a few
> things that I would like to see happen before forking stable-0.11.  Namely:
>
>  o Setup qemu.org infrastructure (git hosting, wiki)
>  o Setup qemu bug tracker (see next mail)
>  o Include all ROM source code in tree via git submodules.  This is a major
> headache for distributors and I think it's important to resolve before our
> next release.

I think this is great, but OpenBIOS still uses Subversion. Can git use
SVN submodules for example?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions

2009-06-23 Thread Michael S. Tsirkin

On Tue, Jun 23, 2009 at 07:35:00AM -0700, Davide Libenzi wrote:
> On Tue, 23 Jun 2009, Gregory Haskins wrote:
> 
> > Davide Libenzi wrote:
> > > On Mon, 22 Jun 2009, Gregory Haskins wrote:
> > >
> > >   
> > >> To be honest, I am not sure.  I would guess its not a *huge* deal,
> > >> though it was obviously enough of a concern to at least discuss it.  I
> > >> can definitely say that I think the other issues which are being fixed
> > >> are substantially more important.
> > >> 
> > >
> > > Ok then, will repost the revised patch later today.
> > >   
> > 
> > Ok sounds good.  I did have a chance to take a closer look at your
> > proposal for the key data, and I think it makes a lot of sense.  Do you
> > see it as a problem if we defer adding that?  Or should we try to add
> > that notion now?
> 
> That would need to go eventually via mainline, after some discussion. But 
> yes, I believe that using the "key" as simple bitmask is a little 
> restrictive.
> 
> 
> - Davide

I agree.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication

2009-06-23 Thread Daniel P. Berrange

On Tue, Jun 23, 2009 at 01:55:52PM +0100, Paul Brook wrote:
> > Here are two patches. One implements a virtio-serial device in qemu
> > and the other is the driver for a guest kernel.
> 
> So I'll ask again. Why is this separate from virtio-console?

In the guest I wouldn't want virtio-serial devices to be mixed up with
the virtio-console device. virtio-console has nice clear usecase of
being an interactive console, and as such the guest OS can & should 
automatically start a mingetty/agetty process on any virtio-console
device it finds.  If we use virtio-console for data channels to, then
guest config becomes much harder todo automatically.

By all means share underlying code/infrastructure where appropriate,
but they must ultimately appear as clearly separate devices IMHO

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm: remove in_range from kvm_io_device

2009-06-23 Thread Michael S. Tsirkin

Remove in_range from kvm_io_device and ask read/write callbacks, if
supplied, to perform range checks internally. This allows aliasing
(mostly for in-kernel virtio), as well as better error handling by
making it possible to pass errors up to userspace. And it's enough to
look at the diffstat to see that it's a better API anyway.

While we are at it, document locking rules for kvm_io_device.

Signed-off-by: Michael S. Tsirkin 
---

This is a replacement for my patch extending in_range.
This works for me, but please review carefully.

 arch/ia64/kvm/kvm-ia64.c  |   28 -
 arch/x86/kvm/i8254.c  |   49 --
 arch/x86/kvm/i8259.c  |   22 ++
 arch/x86/kvm/lapic.c  |   48 ++
 arch/x86/kvm/x86.c|  100 +++--
 include/linux/kvm_host.h  |6 ++-
 virt/kvm/coalesced_mmio.c |   26 +---
 virt/kvm/ioapic.c |   24 ++-
 virt/kvm/iodev.h  |   34 ++-
 virt/kvm/kvm_main.c   |   25 +++-
 10 files changed, 158 insertions(+), 204 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index d20a5db..647ff91 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -198,16 +198,6 @@ int kvm_dev_ioctl_check_extension(long ext)
 
 }
 
-static struct kvm_io_device *vcpu_find_mmio_dev(struct kvm_vcpu *vcpu,
-   gpa_t addr, int len, int is_write)
-{
-   struct kvm_io_device *dev;
-
-   dev = kvm_io_bus_find_dev(&vcpu->kvm->mmio_bus, addr, len, is_write);
-
-   return dev;
-}
-
 static int handle_vm_error(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
kvm_run->exit_reason = KVM_EXIT_UNKNOWN;
@@ -219,6 +209,7 @@ static int handle_mmio(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
 {
struct kvm_mmio_req *p;
struct kvm_io_device *mmio_dev;
+   int r;
 
p = kvm_get_vcpu_ioreq(vcpu);
 
@@ -235,16 +226,13 @@ static int handle_mmio(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
kvm_run->exit_reason = KVM_EXIT_MMIO;
return 0;
 mmio:
-   mmio_dev = vcpu_find_mmio_dev(vcpu, p->addr, p->size, !p->dir);
-   if (mmio_dev) {
-   if (!p->dir)
-   kvm_iodevice_write(mmio_dev, p->addr, p->size,
-   &p->data);
-   else
-   kvm_iodevice_read(mmio_dev, p->addr, p->size,
-   &p->data);
-
-   } else
+   if (p->dir)
+   r = kvm_io_bus_read(&vcpu->kvm->mmio_bus, p->addr, p->size,
+   &p->data);
+   else
+   r = kvm_io_bus_write(&vcpu->kvm->mmio_bus, p->addr, p->size,
+&p->data);
+   if (r)
printk(KERN_ERR"kvm: No iodevice found! addr:%lx\n", p->addr);
p->state = STATE_IORESP_READY;
 
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index c13bb92..d5881a4 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -336,8 +336,14 @@ void kvm_pit_load_count(struct kvm *kvm, int channel, u32 
val)
mutex_unlock(&kvm->arch.vpit->pit_state.lock);
 }
 
-static void pit_ioport_write(struct kvm_io_device *this,
-gpa_t addr, int len, const void *data)
+static inline int pit_in_range(gpa_t addr)
+{
+   return ((addr >= KVM_PIT_BASE_ADDRESS) &&
+   (addr < KVM_PIT_BASE_ADDRESS + KVM_PIT_MEM_LENGTH));
+}
+
+static int pit_ioport_write(struct kvm_io_device *this,
+   gpa_t addr, int len, const void *data)
 {
struct kvm_pit *pit = (struct kvm_pit *)this->private;
struct kvm_kpit_state *pit_state = &pit->pit_state;
@@ -345,6 +351,8 @@ static void pit_ioport_write(struct kvm_io_device *this,
int channel, access;
struct kvm_kpit_channel_state *s;
u32 val = *(u32 *) data;
+   if (!pit_in_range(addr))
+   return -EOPNOTSUPP;
 
val  &= 0xff;
addr &= KVM_PIT_CHANNEL_MASK;
@@ -407,16 +415,19 @@ static void pit_ioport_write(struct kvm_io_device *this,
}
 
mutex_unlock(&pit_state->lock);
+   return 0;
 }
 
-static void pit_ioport_read(struct kvm_io_device *this,
-   gpa_t addr, int len, void *data)
+static int pit_ioport_read(struct kvm_io_device *this,
+  gpa_t addr, int len, void *data)
 {
struct kvm_pit *pit = (struct kvm_pit *)this->private;
struct kvm_kpit_state *pit_state = &pit->pit_state;
struct kvm *kvm = pit->kvm;
int ret, count;
struct kvm_kpit_channel_state *s;
+   if (!pit_in_range(addr))
+   return -EOPNOTSUPP;
 
addr &= KVM_PIT_CHANNEL_MASK;
s = &pit_state->channels[addr];
@@ -471,37 +482,36 @@ static void pit_ioport_read(struct kvm_io_device *this,
memcpy(data, (cha

Re: [KVM PATCH v3 3/3] KVM: Fix races in irqfd using new eventfd_kref_get interface

2009-06-23 Thread Gregory Haskins

Gregory Haskins wrote:
> Michael S. Tsirkin wrote:
>   
>
>>  And note that multiple works can run on irqfd
>> in parallel.
>>   
>> 
>
> They can?  I thought work-queue items were guaranteed to only schedule
> once?  If what you say is true, its broken, I agree, and Ill need to
> revisit.  Let me get back to you.
>   

 Yep, you are right.  This is broken in my code. :(

Will address in the next spin.  Good catch.

-Greg





signature.asc
Description: OpenPGP digital signature

Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions

2009-06-23 Thread Gregory Haskins

Davide Libenzi wrote:
> On Tue, 23 Jun 2009, Gregory Haskins wrote:
>
>   
>> Davide Libenzi wrote:
>> 
>>> On Mon, 22 Jun 2009, Gregory Haskins wrote:
>>>
>>>   
>>>   
 To be honest, I am not sure.  I would guess its not a *huge* deal,
 though it was obviously enough of a concern to at least discuss it.  I
 can definitely say that I think the other issues which are being fixed
 are substantially more important.
 
 
>>> Ok then, will repost the revised patch later today.
>>>   
>>>   
>> Ok sounds good.  I did have a chance to take a closer look at your
>> proposal for the key data, and I think it makes a lot of sense.  Do you
>> see it as a problem if we defer adding that?  Or should we try to add
>> that notion now?
>> 
>
> That would need to go eventually via mainline, after some discussion. But 
> yes, I believe that using the "key" as simple bitmask is a little 
> restrictive.
>   

Ok.   As long as you do not think we are somehow painting ourselves into
a corner, lets just go with your original plan for the revised patch
sans key changes.

Thanks Davide,
-Greg




signature.asc
Description: OpenPGP digital signature

Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions

2009-06-23 Thread Davide Libenzi

On Tue, 23 Jun 2009, Gregory Haskins wrote:

> Davide Libenzi wrote:
> > On Mon, 22 Jun 2009, Gregory Haskins wrote:
> >
> >   
> >> To be honest, I am not sure.  I would guess its not a *huge* deal,
> >> though it was obviously enough of a concern to at least discuss it.  I
> >> can definitely say that I think the other issues which are being fixed
> >> are substantially more important.
> >> 
> >
> > Ok then, will repost the revised patch later today.
> >   
> 
> Ok sounds good.  I did have a chance to take a closer look at your
> proposal for the key data, and I think it makes a lot of sense.  Do you
> see it as a problem if we defer adding that?  Or should we try to add
> that notion now?

That would need to go eventually via mainline, after some discussion. But 
yes, I believe that using the "key" as simple bitmask is a little 
restrictive.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication

2009-06-23 Thread Christian Bornträger

Am Dienstag 23 Juni 2009 16:16:13 schrieb Paul Brook:
> > I did some work on virtio-console, since kvm on s390 does not provide any
> > other. I dont think we should mix two different types of devices into one
> > driver. The only thing that these drivers have in common, is the fact
> > that there are two virtqueues, piping data (single bytes or larger
> > chunks). So you could make the same argument with the first virtio_net
> > driver (the one before GSO) - which is obviously wrong. The common part
> > of the transport is already factored out to virtio_ring and the
> > transports.
>
> virtio-net is packet based, not stream based.

You can argue that virtio-console is also packet based. The input buffer can 
accept up to 4K in one buffer and the console code can also submit larger 
chunks 
to virtio_console.

> > In addition there are two ABIs involved: a userspace ABI (/dev/hvc0) and
> > a guest/host ABI for this console. (and virtio was not meant to be a
> > KVM-only interface, that we can change all the time). David A. Wheeler's
> > 'SLOCCount' gives me 141 lines of code for virtio_console.c. I am quite
> > confident that the saving we could achieve by merging these two drivers
> > is not worth the hazzle.
>
> AFAICS the functionality provided is exactly the same. The host API is
> identical, and the guest userspace API only has trivial differences (which
> could be eliminated with a simple udev rule). By my reading virtio-serial
> makes virtio-console entirely redundant.

How can you know, that the userspace API only has trivial differences, if the 
question below is not answered?

> > Discussion about merging the console code into this distracts from the
> > main problem: To get the interface and functionality right before it
> > becomes an ABI (is it /dev/ttyS, network like or is it something
> > completely different?).
>
> Ah, now that's a different question. I don't know what the requirements are
> for the higher level vmchannel interface. However I also don't care.

You should care, because it might have an impact if two serial lines are really 
the right solution for the vmchannel.

One thing that I forgot:
You should be warned that hvc_console sometimes can be a real PITA. A while ago 
I tried to change virtio_console to support more than one console and hotplug 
and failed to find a proper solution that can handle all the subtle  
console/tty 
register/unregister combinations. You dont want to adopt new code to fit to 
hvc_console - leave it in virtio_console...


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions

2009-06-23 Thread Davide Libenzi

On Tue, 23 Jun 2009, Rusty Russell wrote:

> The first 'struct eventfd_ctx;' line is not required.

Will repost dropping that.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions

2009-06-23 Thread Gregory Haskins

Davide Libenzi wrote:
> On Mon, 22 Jun 2009, Gregory Haskins wrote:
>
>   
>> To be honest, I am not sure.  I would guess its not a *huge* deal,
>> though it was obviously enough of a concern to at least discuss it.  I
>> can definitely say that I think the other issues which are being fixed
>> are substantially more important.
>> 
>
> Ok then, will repost the revised patch later today.
>   

Ok sounds good.  I did have a chance to take a closer look at your
proposal for the key data, and I think it makes a lot of sense.  Do you
see it as a problem if we defer adding that?  Or should we try to add
that notion now?

-Greg

signature.asc
Description: OpenPGP digital signature

Re: [PATCH 3/3] eventfd: add internal reference counting to fix notifier race conditions

2009-06-23 Thread Davide Libenzi

On Mon, 22 Jun 2009, Gregory Haskins wrote:

> To be honest, I am not sure.  I would guess its not a *huge* deal,
> though it was obviously enough of a concern to at least discuss it.  I
> can definitely say that I think the other issues which are being fixed
> are substantially more important.

Ok then, will repost the revised patch later today.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2 v2] allow hypervisor CPUID bit to be overriden

2009-06-23 Thread Anthony Liguori


Andre Przywara wrote:

KVM defaults to the hypervisor CPUID bit to be set, whereas pure
QEMU clears it. On some occasions one wants to set or clear it the
other way round (for instance to get HyperV running inside a guest).

Move the bit-set to be done before the command line parsing and
enable it by default. One can disable it by using: -cpu qemu64,-hypervisor
Fix some whitespace damage on the way.

Signed-off-by: Andre Przywara 

  

Please also resend 2/2.


--
Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication

2009-06-23 Thread Paul Brook

On Tuesday 23 June 2009, Christian Bornträger wrote:
> Am Dienstag 23 Juni 2009 14:55:52 schrieb Paul Brook:
> > > Here are two patches. One implements a virtio-serial device in qemu
> > > and the other is the driver for a guest kernel.
> >
> > So I'll ask again. Why is this separate from virtio-console?
>
> I did some work on virtio-console, since kvm on s390 does not provide any
> other. I dont think we should mix two different types of devices into one
> driver. The only thing that these drivers have in common, is the fact that
> there are two virtqueues, piping data (single bytes or larger chunks). So
> you could make the same argument with the first virtio_net driver (the one
> before GSO) - which is obviously wrong. The common part of the transport is
> already factored out to virtio_ring and the transports.

virtio-net is packet based, not stream based.

> In addition there are two ABIs involved: a userspace ABI (/dev/hvc0) and a
> guest/host ABI for this console. (and virtio was not meant to be a KVM-only
> interface, that we can change all the time). David A. Wheeler's 'SLOCCount'
> gives me 141 lines of code for virtio_console.c. I am quite confident that
> the saving we could achieve by merging these two drivers is not worth the
> hazzle.

AFAICS the functionality provided is exactly the same. The host API is 
identical, and the guest userspace API only has trivial differences (which 
could be eliminated with a simple udev rule). By my reading virtio-serial 
makes virtio-console entirely redundant.

> Discussion about merging the console code into this distracts from the main
> problem: To get the interface and functionality right before it becomes an
> ABI (is it /dev/ttyS, network like or is it something completely
> different?).

Ah, now that's a different question. I don't know what the requirements are 
for the higher level vmchannel interface. However I also don't care.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication

2009-06-23 Thread Christian Bornträger

Am Dienstag 23 Juni 2009 14:55:52 schrieb Paul Brook:
> > Here are two patches. One implements a virtio-serial device in qemu
> > and the other is the driver for a guest kernel.
>
> So I'll ask again. Why is this separate from virtio-console?

I did some work on virtio-console, since kvm on s390 does not provide any other.
I dont think we should mix two different types of devices into one driver. The 
only thing that these drivers have in common, is the fact that there are two 
virtqueues, piping data (single bytes or larger chunks). So you could make the 
same argument with the first virtio_net driver (the one before GSO) - which is 
obviously wrong. The common part of the transport is already factored out to 
virtio_ring and the transports.

In addition there are two ABIs involved: a userspace ABI (/dev/hvc0) and a 
guest/host ABI for this console. (and virtio was not meant to be a KVM-only 
interface, that we can change all the time). David A. Wheeler's 'SLOCCount' 
gives me 141 lines of code for virtio_console.c. I am quite confident that the 
saving we could achieve by merging these two drivers is not worth the hazzle.

Discussion about merging the console code into this distracts from the main 
problem: To get the interface and functionality right before it becomes an ABI 
(is it /dev/ttyS, network like or is it something completely different?).

Christian



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v8 3/3] KVM: add iosignalfd support

2009-06-23 Thread Gregory Haskins

Michael S. Tsirkin wrote:
> On Thu, Jun 18, 2009 at 08:30:46PM -0400, Gregory Haskins wrote:
>   
>> +static int
>> +iosignalfd_group_in_range(struct kvm_io_device *this, gpa_t addr, int len,
>> +  int is_write)
>> +{
>> +struct _iosignalfd_group *p = to_group(this);
>> +
>> +return ((addr >= p->addr && (addr < p->addr + p->length)));
>> +}
>> 
>
> I think I see a problem here. For virtio, we do not necessarily want all
> virtqueues for a device to live in kernel: there might be control
> virtqueues that we want to leave in userspace. Since this claims all
> writes to a specific address, the signal never makes it to userspace.
>
>   
So based on this, I think you are right about the io_bus changes.  If we
accept your proposal this problem above is solved cleanly.  Sorry for
the resistance, but I just wanted to make sure we were doing the right
thing.  I am in agreement now.

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication

2009-06-23 Thread Paul Brook

On Tuesday 23 June 2009, Amit Shah wrote:
> On (Tue) Jun 23 2009 [13:55:52], Paul Brook wrote:
> > > Here are two patches. One implements a virtio-serial device in qemu
> > > and the other is the driver for a guest kernel.
> >
> > So I'll ask again. Why is this separate from virtio-console?
>
> I'm basically writing a vmchannel and found out that a lot can be shared
> between some virtio devices. So I'm just trying to abstract out those
> things in virtio-serial. Once we're sure virtio-serial is good and ready
> to be merged, I will look at converting over virtio-console to the
> virtio-serial interface.

That doesn't really answer my question. We already have a virtual serial 
device (called virtio-console). Why are you inventing another one?

Paul

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication

2009-06-23 Thread Amit Shah

On (Tue) Jun 23 2009 [13:55:52], Paul Brook wrote:
> > Here are two patches. One implements a virtio-serial device in qemu
> > and the other is the driver for a guest kernel.
> 
> So I'll ask again. Why is this separate from virtio-console?

I'm basically writing a vmchannel and found out that a lot can be shared
between some virtio devices. So I'm just trying to abstract out those
things in virtio-serial. Once we're sure virtio-serial is good and ready
to be merged, I will look at converting over virtio-console to the
virtio-serial interface.

Is that an acceptable approach?

Thanks,
Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] virtio-serial: A guest <-> host interface for simple communication

2009-06-23 Thread Paul Brook

> Here are two patches. One implements a virtio-serial device in qemu
> and the other is the driver for a guest kernel.

So I'll ask again. Why is this separate from virtio-console?

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] virtio-serial: virtio device for simple host <-> guest communication

2009-06-23 Thread Amit Shah

This interface presents a char device from which bits can be
sent and read.

Sample uses for such a device can be obtaining info from the
guest like the file systems used, apps installed, etc. for
offline usage and logged-in users, clipboard copy-paste, etc.
for online usage.

Signed-off-by: Amit Shah 
---
 Makefile.target|2 +-
 hw/pc.c|   17 +++
 hw/pci.h   |1 +
 hw/qdev.c  |6 +-
 hw/virtio-pci.c|   15 +++
 hw/virtio-serial.c |  354 
 hw/virtio-serial.h |   39 ++
 hw/virtio.h|1 +
 monitor.c  |9 ++
 qemu-monitor.hx|7 +
 qemu-options.hx|8 ++
 sysemu.h   |   11 ++
 vl.c   |   63 +
 13 files changed, 531 insertions(+), 2 deletions(-)
 create mode 100644 hw/virtio-serial.c
 create mode 100644 hw/virtio-serial.h

diff --git a/Makefile.target b/Makefile.target
index 08121a9..7a0deb3 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -550,7 +550,7 @@ OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o 
machine.o \
  gdbstub.o gdbstub-xml.o
 # virtio has to be here due to weird dependency between PCI and virtio-net.
 # need to fix this properly
-OBJS+=virtio-blk.o virtio-balloon.o virtio-net.o virtio-console.o
+OBJS+=virtio-blk.o virtio-balloon.o virtio-net.o virtio-console.o 
virtio-serial.o
 ifdef CONFIG_KVM
 OBJS+=kvm.o kvm-all.o
 endif
diff --git a/hw/pc.c b/hw/pc.c
index cb5b4d0..6ccf4b8 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -40,6 +40,8 @@
 
 #include "qemu-kvm.h"
 
+void *virtio_serial_new_port(PCIDevice *dev, int idx, char *name);
+
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
 
@@ -1204,6 +1206,21 @@ static void pc_init1(ram_addr_t ram_size,
 }
 }
 
+/* Add virtio serial devices */
+if (pci_enabled && virtio_serial_index) {
+void *dev;
+
+dev = pci_create_simple(pci_bus, -1, "virtio-serial-pci");
+if (!dev) {
+fprintf(stderr, "qemu: could not create virtio serial pci 
device\n");
+exit(1);
+}
+
+for (i = 0; i < virtio_serial_index; i++) {
+virtio_serial_new_port(dev, i, virtio_serial_names[i]);
+}
+}
+
 #ifdef USE_KVM_DEVICE_ASSIGNMENT
 if (kvm_enabled()) {
 add_assigned_devices(pci_bus, assigned_devices, 
assigned_devices_index);
diff --git a/hw/pci.h b/hw/pci.h
index d835b6d..fe5f29e 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -72,6 +72,7 @@ extern target_phys_addr_t pci_mem_base;
 #define PCI_DEVICE_ID_VIRTIO_BLOCK   0x1001
 #define PCI_DEVICE_ID_VIRTIO_BALLOON 0x1002
 #define PCI_DEVICE_ID_VIRTIO_CONSOLE 0x1003
+#define PCI_DEVICE_ID_VIRTIO_SERIAL  0x1004
 
 typedef void PCIConfigWriteFunc(PCIDevice *pci_dev,
 uint32_t address, uint32_t data, int len);
diff --git a/hw/qdev.c b/hw/qdev.c
index 385e709..fe5bcd0 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -168,9 +168,13 @@ CharDriverState *qdev_init_chardev(DeviceState *dev)
 {
 static int next_serial;
 static int next_virtconsole;
+static int next_virtserial;
+
 /* FIXME: This is a nasty hack that needs to go away.  */
-if (strncmp(dev->type->info->name, "virtio", 6) == 0) {
+if (strncmp(dev->type->info->name, "virtio-console", 14) == 0) {
 return virtcon_hds[next_virtconsole++];
+} else if (strncmp(dev->type->info->name, "virtio-serial", 13) == 0) {
+return virtio_serial_hds[next_virtserial++];
 } else {
 return serial_hds[next_serial++];
 }
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 24fe837..199f0c6 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -334,6 +334,19 @@ static void virtio_balloon_init_pci(PCIDevice *pci_dev)
 0x00);
 }
 
+static void virtio_serial_init_pci(PCIDevice *pci_dev)
+{
+VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
+VirtIODevice *vdev;
+
+vdev = virtio_serial_init(&pci_dev->qdev);
+virtio_init_pci(proxy, vdev,
+PCI_VENDOR_ID_REDHAT_QUMRANET,
+PCI_DEVICE_ID_VIRTIO_SERIAL,
+PCI_CLASS_COMMUNICATION_OTHER,
+0x00);
+}
+
 static void virtio_pci_register_devices(void)
 {
 pci_qdev_register("virtio-blk-pci", sizeof(VirtIOPCIProxy),
@@ -344,6 +357,8 @@ static void virtio_pci_register_devices(void)
   virtio_console_init_pci);
 pci_qdev_register("virtio-balloon-pci", sizeof(VirtIOPCIProxy),
   virtio_balloon_init_pci);
+pci_qdev_register("virtio-serial-pci", sizeof(VirtIOPCIProxy),
+  virtio_serial_init_pci);
 }
 
 device_init(virtio_pci_register_devices)
diff --git a/hw/virtio-serial.c b/hw/virtio-serial.c
new file mode 100644
index 000..5162316
--- /dev/null
+++ b/hw/virtio-serial.c
@@ -0,0 +1,354 @@
+/*
+ * Virtio serial interface
+ *
+ * This serial interface is a paravirtualised guest<->host
+ *

[PATCH] virtio_serial: A char device for simple guest <-> host communication

2009-06-23 Thread Amit Shah

We expose multiple char devices ("ports") for simple communication
between the host userspace and guest.

This is mainly intended for obtaining information from the guest.
Sample offline usages can be: poking around in a guest to find out
the file systems used, applications installed, etc. Online usages
can be sharing of clipboard data between the guest and the host,
sending information about logged-in users to the host, locking the
screen or session when a vnc session is closed, and so on.

The interface presented to guest userspace is of a simple char
device, so it can be used like this:

fd = open("/dev/vmch0", O_RDWR);
ret = read(fd, buf, 100);
ret = write(fd, string, strlen(string));
ret = ioctl(fd, VIRTIO_SERIAL_IOCTL_GET_PORT_NAME, &port_name);

Signed-off-by: Amit Shah 
---
 drivers/char/Kconfig  |6 +
 drivers/char/Makefile |1 +
 drivers/char/virtio_serial.c  |  571 +
 include/linux/virtio_serial.h |   42 +++
 4 files changed, 620 insertions(+), 0 deletions(-)
 create mode 100644 drivers/char/virtio_serial.c
 create mode 100644 include/linux/virtio_serial.h

diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index 30bae6d..84a1718 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -679,6 +679,12 @@ config VIRTIO_CONSOLE
help
  Virtio console for use with lguest and other hypervisors.
 
+config VIRTIO_SERIAL
+   tristate "Virtio serial"
+   select VIRTIO
+   select VIRTIO_RING
+   help
+ Virtio serial device driver for simple guest and host communication
 
 config HVCS
tristate "IBM Hypervisor Virtual Console Server support"
diff --git a/drivers/char/Makefile b/drivers/char/Makefile
index 189efcf..0b6c71e 100644
--- a/drivers/char/Makefile
+++ b/drivers/char/Makefile
@@ -54,6 +54,7 @@ obj-$(CONFIG_HVC_XEN) += hvc_xen.o
 obj-$(CONFIG_HVC_IUCV) += hvc_iucv.o
 obj-$(CONFIG_HVC_UDBG) += hvc_udbg.o
 obj-$(CONFIG_VIRTIO_CONSOLE)   += virtio_console.o
+obj-$(CONFIG_VIRTIO_SERIAL)+= virtio_serial.o
 obj-$(CONFIG_RAW_DRIVER)   += raw.o
 obj-$(CONFIG_SGI_SNSC) += snsc.o snsc_event.o
 obj-$(CONFIG_MSPEC)+= mspec.o
diff --git a/drivers/char/virtio_serial.c b/drivers/char/virtio_serial.c
new file mode 100644
index 000..6c5ad5d
--- /dev/null
+++ b/drivers/char/virtio_serial.c
@@ -0,0 +1,571 @@
+/*
+ * VirtIO Serial driver
+ *
+ * This is paravirtualised serial guest<->host communication channel
+ * for relaying short messages and events in either direction.
+ *
+ * One PCI device can have multiple serial ports so that different
+ * applications can communicate without polluting the PCI device space
+ * in the guest.
+ *
+ * Copyright (C) 2009, Red Hat, Inc.
+ *
+ * Author(s): Amit Shah 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct virtio_serial_struct {
+   struct virtio_device *vdev;
+   struct virtqueue *ctrl_vq;
+   struct virtqueue *in_vq, *out_vq;
+   struct virtio_serial_port *ports;
+   u32 nr_ports;
+   struct work_struct rx_work;
+   struct work_struct queue_work;
+};
+
+struct virtio_serial_port_buffer {
+   unsigned int len;
+   unsigned int offset;
+   char *buf;
+
+   struct list_head list;
+};
+
+struct virtio_serial_id {
+   u32 id; /* Port number */
+};
+
+struct virtio_serial_port {
+   struct virtio_serial_struct *virtserial;
+
+   /* The name given to this channel, if any. NOT zero-terminated */
+   char *name;
+
+   /* Each port associates with a separate char device */
+   dev_t dev;
+   struct cdev cdev;
+
+   /* Buffer management */
+   struct virtio_serial_port_buffer read_buf;
+   struct list_head readbuf_head;
+};
+
+static struct virtio_serial_struct virtserial;
+
+/* This should become per-port data */
+static DECLARE_COMPLETION(have_data);
+
+static int major = 60; /* from the experimental range */
+
+static struct virtio_serial_port *get_port_from_id(u32 id)
+{
+   if (id > virtserial.nr_ports)
+   return NULL;
+
+   return &virtserial.ports[id];
+}
+
+static int get_id_from_port(struct virt

virtio-serial: A guest <-> host interface for simple communication

2009-06-23 Thread Amit Shah


Hello,

Here are two patches. One implements a virtio-serial device in qemu
and the other is the driver for a guest kernel.

While working on a vmchannel interface that is needed for communication
between guest userspace and host userspace, I saw that most of the
interface can be abstracted out as a "serial" device with "ports".

Some requirements for a vmchannel are listed at this page:
http://kvm.et.redhat.com/page/VMchannel_Requirements

A few sample uses for a vmchannel are to share the host and guest
clipboards (to allow copy/paste between a host and a guest), to
lock the screen of the guest session when the vnc viewer is closed,
to find out which applications are installed on a guest OS even when
the guest is powered down (using virt-inspector) and so on.

At this time, the qemu device is more complete than the guest driver
is. However, I'd like you to review both the patches for inclusion.

Thanks,
Amit.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2 v2] allow hypervisor CPUID bit to be overriden

2009-06-23 Thread Andre Przywara

KVM defaults to the hypervisor CPUID bit to be set, whereas pure
QEMU clears it. On some occasions one wants to set or clear it the
other way round (for instance to get HyperV running inside a guest).

Move the bit-set to be done before the command line parsing and
enable it by default. One can disable it by using: -cpu qemu64,-hypervisor
Fix some whitespace damage on the way.

Signed-off-by: Andre Przywara 
---
 target-i386/helper.c |   23 +++
 1 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/target-i386/helper.c b/target-i386/helper.c
index 8a76abd..cb9113e 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -44,7 +44,7 @@ static const char *ext_feature_name[] = {
 "pni" /* Intel,AMD sse3 */, NULL, NULL, "monitor", "ds_cpl", "vmx", NULL 
/* Linux smx */, "est",
 "tm2", "ssse3", "cid", NULL, NULL, "cx16", "xtpr", NULL,
 NULL, NULL, "dca", NULL, NULL, NULL, NULL, "popcnt",
-   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL, NULL, NULL, NULL, "hypervisor",
 };
 static const char *ext2_feature_name[] = {
 "fpu", "vme", "de", "pse", "tsc", "msr", "pae", "mce",
@@ -59,30 +59,30 @@ static const char *ext3_feature_name[] = {
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
 };
 
-static void add_flagname_to_bitmaps(char *flagname, uint32_t *features, 
-uint32_t *ext_features, 
-uint32_t *ext2_features, 
+static void add_flagname_to_bitmaps(const char *flagname, uint32_t *features,
+uint32_t *ext_features,
+uint32_t *ext2_features,
 uint32_t *ext3_features)
 {
 int i;
 int found = 0;
 
-for ( i = 0 ; i < 32 ; i++ ) 
+for ( i = 0 ; i < 32 ; i++ )
 if (feature_name[i] && !strcmp (flagname, feature_name[i])) {
 *features |= 1 << i;
 found = 1;
 }
-for ( i = 0 ; i < 32 ; i++ ) 
+for ( i = 0 ; i < 32 ; i++ )
 if (ext_feature_name[i] && !strcmp (flagname, ext_feature_name[i])) {
 *ext_features |= 1 << i;
 found = 1;
 }
-for ( i = 0 ; i < 32 ; i++ ) 
+for ( i = 0 ; i < 32 ; i++ )
 if (ext2_feature_name[i] && !strcmp (flagname, ext2_feature_name[i])) {
 *ext2_features |= 1 << i;
 found = 1;
 }
-for ( i = 0 ; i < 32 ; i++ ) 
+for ( i = 0 ; i < 32 ; i++ )
 if (ext3_feature_name[i] && !strcmp (flagname, ext3_feature_name[i])) {
 *ext3_features |= 1 << i;
 found = 1;
@@ -330,6 +330,9 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
const char *cpu_model)
 goto error;
 memcpy(x86_cpu_def, def, sizeof(*def));
 
+add_flagname_to_bitmaps("hypervisor", &plus_features,
+&plus_ext_features, &plus_ext2_features, &plus_ext3_features);
+
 featurestr = strtok(NULL, ",");
 
 while (featurestr) {
@@ -1523,10 +1526,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *ebx = (env->cpuid_apic_id << 24) | 8 << 8; /* CLFLUSH size in quad 
words, Linux wants it. */
 *ecx = env->cpuid_ext_features;
 *edx = env->cpuid_features;
-
-/* "Hypervisor present" bit required for Microsoft SVVP */
-if (kvm_enabled())
-*ecx |= (1 << 31);
 break;
 case 2:
 /* cache info: needed for Pentium Pro compatibility */
-- 
1.6.1.3


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] pass write value to in_range pointers

2009-06-23 Thread Avi Kivity


On 06/23/2009 03:01 PM, Gregory Haskins wrote:

Ok, so for now I will just crank up the io_bus array, and we can address
scale another day.  Can I just drop patch 2/3 and let the io_bus govern
the limit?
   


So long as we have a runtime-discoverable limit, yes.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Planning for the 0.11.0 release

2009-06-23 Thread Avi Kivity


On 06/23/2009 02:57 AM, Anthony Liguori wrote:

Hi,

It's getting to be about the time to start thinking about the 0.11.0 
release.  0.10.0 was released on March 2nd so following with the 6 
month release cycle, that would put 0.11.0 at September 2nd.


Based on the experiences with the stable releases, here's what I'd 
recommend:




Seems sane to me.


o Release qemu-0.11.0-rc1
o Release additional -rcN releases every 1-2 weeks


I'll release qemu-kvm-0.11.0-rc* in parallel so we can get feedback from 
kvm users as well.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] pass write value to in_range pointers

2009-06-23 Thread Gregory Haskins

Avi Kivity wrote:
> On 06/23/2009 02:44 PM, Michael S. Tsirkin wrote:
>> On Tue, Jun 23, 2009 at 12:04:06AM -0400, Gregory Haskins wrote:
>>   
>   It will also need to support
> multiple matches.
>
>  
 What, signal many fds on the same address/value pair?
 I see this as a bug. Why is this a good thing to support?
 Just increases the chance of leaking this fd.


>>> I believe Avi asked for this feature specifically, so I will defer
>>> to him.
>>>  
>>
>> Hmm. That's hard to implement in my model. Avi, can we give up
>> this feature? I don't think anyone needs this specifically ...
>>
>
> I think we can make do with passing that single eventfd to multiple
> consumers.  It means their event count reads may return zero, but I
> guess we can live with that.
>
> I do want to retain flexibility in how we route events.
>

Ok, so for now I will just crank up the io_bus array, and we can address
scale another day.  Can I just drop patch 2/3 and let the io_bus govern
the limit?

-Greg



signature.asc
Description: OpenPGP digital signature

Re: [PATCH RFC] pass write value to in_range pointers

2009-06-23 Thread Avi Kivity


On 06/23/2009 02:44 PM, Michael S. Tsirkin wrote:

On Tue, Jun 23, 2009 at 12:04:06AM -0400, Gregory Haskins wrote:
   

  It will also need to support
multiple matches.

 

What, signal many fds on the same address/value pair?
I see this as a bug. Why is this a good thing to support?
Just increases the chance of leaking this fd.

   

I believe Avi asked for this feature specifically, so I will defer to him.
 


Hmm. That's hard to implement in my model. Avi, can we give up
this feature? I don't think anyone needs this specifically ...
   


I think we can make do with passing that single eventfd to multiple 
consumers.  It means their event count reads may return zero, but I 
guess we can live with that.


I do want to retain flexibility in how we route events.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] pass write value to in_range pointers

2009-06-23 Thread Gregory Haskins

Michael S. Tsirkin wrote:
> On Tue, Jun 23, 2009 at 12:04:06AM -0400, Gregory Haskins wrote:
>   
  It will also need to support
 multiple matches.
 
 
>>> What, signal many fds on the same address/value pair?
>>> I see this as a bug. Why is this a good thing to support?
>>> Just increases the chance of leaking this fd.
>>>   
>>>   
>> I believe Avi asked for this feature specifically, so I will defer to him.
>> 
>
> Hmm. That's hard to implement in my model. Avi, can we give up
> this feature? I don't think anyone needs this specifically ...
>
>   

If we can drop this it will simplify the mods I will need to do to
io_bus to support Michael's new interface.

Out of curiosity, what is it you are trying to build Michael?

-Greg



signature.asc
Description: OpenPGP digital signature

Re: [PATCH RFC] pass write value to in_range pointers

2009-06-23 Thread Avi Kivity


On 06/23/2009 07:04 AM, Gregory Haskins wrote:

Well, for one its not very clear what the benefit of the read/write
aliasing even is. ;)  Apparently coalesced_mmio uses it, but even so I
doubt that is for the purposes of having one device do reads while
another does writes.  I could be wrong, though.
   


Coalesced mmio cannot handle reads.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] pass write value to in_range pointers

2009-06-23 Thread Michael S. Tsirkin

On Tue, Jun 23, 2009 at 07:41:12AM -0400, Gregory Haskins wrote:
> Avi Kivity wrote:
> > On 06/22/2009 07:08 PM, Michael S. Tsirkin wrote:
> >> On Mon, Jun 22, 2009 at 11:45:00AM -0400, Gregory Haskins wrote:
> >>   
> >>> Michael S. Tsirkin wrote:
> >>> 
>  It seems that a lot of complexity and trickiness with iosignalfd is
>  handling the group/item relationship, which comes about because kvm
>  does
>  not currently let a device on the bus claim a write transaction
>  based on the
>  value written.  This could be greatly simplified if the value written
>  was passed to the in_range check for write operation.  We could then
>  simply make each kvm_iosignalfd a device on the bus.
> 
>  What does everyone think of the following lightly tested patch?
> 
> 
> >>> Hi Michael,
> >>>Its interesting, but I am not convinced its necessary.  We
> >>> created the
> >>> group/item layout because iosignalfds are unique in that they are
> >>> probably the only IO device that wants to do some kind of address
> >>> aliasing.
> >>>  
> >>
> >> We actually already have aliasing: is_write flag is used for this
> >> purpose. Actually, it's possible to remove is_write by passing
> >> a null pointer in write_val for reads. I like this a bit less as
> >> the code generated is less compact ... Avi, what do you think?
> >>
> >
> > Greg, won't Michael's patch eliminate a big chunk from your iosignalfd
> > patches?  Seems like a win to me.
> 
> Well, it really just moves that hunk from eventfd.c to kvm_main.c, where
> I don't think anyone else will use it by iosignalfd.  But if that is
> what everyone wants, I guess I have no choice.

Wait a bit before you start rebasing though please.
I just had a brainwave and is rewriting this patch.

> >
> >> One is enough :)
> >> Seriously, do you see that this saves you all of RCU, linked lists and
> >> counters? You don't need to keep track of iofds, you don't need to
> >> implement your own lookup logic - you just use the kvm device
> >> and that's it.
> >>
> >>
> >
> > Yup.
> >
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] pass write value to in_range pointers

2009-06-23 Thread Michael S. Tsirkin

On Tue, Jun 23, 2009 at 12:04:06AM -0400, Gregory Haskins wrote:
> >>  It will also need to support
> >> multiple matches.
> >> 
> >
> > What, signal many fds on the same address/value pair?
> > I see this as a bug. Why is this a good thing to support?
> > Just increases the chance of leaking this fd.
> >   
> 
> I believe Avi asked for this feature specifically, so I will defer to him.

Hmm. That's hard to implement in my model. Avi, can we give up
this feature? I don't think anyone needs this specifically ...

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Re: [PATCH 1/2] allow hypervisor CPUID bit to be overriden

2009-06-23 Thread Paul Brook

> I agree it's pointless, but it is a Microsoft requirement for passing
> their SVVP tests.  Enabling it by default makes life a little easier for
> users who wish to validate their hypervisor and has no drawbacks.

I wasn't arguing against setting it by default (for QEMU CPU types), just 
against enabling it unconditionally.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] pass write value to in_range pointers

2009-06-23 Thread Gregory Haskins

Avi Kivity wrote:
> On 06/22/2009 07:08 PM, Michael S. Tsirkin wrote:
>> On Mon, Jun 22, 2009 at 11:45:00AM -0400, Gregory Haskins wrote:
>>   
>>> Michael S. Tsirkin wrote:
>>> 
 It seems that a lot of complexity and trickiness with iosignalfd is
 handling the group/item relationship, which comes about because kvm
 does
 not currently let a device on the bus claim a write transaction
 based on the
 value written.  This could be greatly simplified if the value written
 was passed to the in_range check for write operation.  We could then
 simply make each kvm_iosignalfd a device on the bus.

 What does everyone think of the following lightly tested patch?


>>> Hi Michael,
>>>Its interesting, but I am not convinced its necessary.  We
>>> created the
>>> group/item layout because iosignalfds are unique in that they are
>>> probably the only IO device that wants to do some kind of address
>>> aliasing.
>>>  
>>
>> We actually already have aliasing: is_write flag is used for this
>> purpose. Actually, it's possible to remove is_write by passing
>> a null pointer in write_val for reads. I like this a bit less as
>> the code generated is less compact ... Avi, what do you think?
>>
>
> Greg, won't Michael's patch eliminate a big chunk from your iosignalfd
> patches?  Seems like a win to me.

Well, it really just moves that hunk from eventfd.c to kvm_main.c, where
I don't think anyone else will use it by iosignalfd.  But if that is
what everyone wants, I guess I have no choice.

>
>> One is enough :)
>> Seriously, do you see that this saves you all of RCU, linked lists and
>> counters? You don't need to keep track of iofds, you don't need to
>> implement your own lookup logic - you just use the kvm device
>> and that's it.
>>
>>
>
> Yup.
>




signature.asc
Description: OpenPGP digital signature

Re: [KVM PATCH v8 3/3] KVM: add iosignalfd support

2009-06-23 Thread Gregory Haskins

Michael S. Tsirkin wrote:
> On Tue, Jun 23, 2009 at 07:33:07AM -0400, Gregory Haskins wrote:
>   
>> Michael S. Tsirkin wrote:
>> 
>>> On Thu, Jun 18, 2009 at 08:30:46PM -0400, Gregory Haskins wrote:
>>>   
>>>   
 +static int
 +iosignalfd_group_in_range(struct kvm_io_device *this, gpa_t addr, int len,
 +int is_write)
 +{
 +  struct _iosignalfd_group *p = to_group(this);
 +
 +  return ((addr >= p->addr && (addr < p->addr + p->length)));
 +}
 
 
>>> I think I see a problem here. For virtio, we do not necessarily want all
>>> virtqueues for a device to live in kernel: there might be control
>>> virtqueues that we want to leave in userspace. Since this claims all
>>> writes to a specific address, the signal never makes it to userspace.
>>>
>>>   
>>>   
>> You can use a wildcard.  Would that work?
>>
>> 
>
> Nope, you need to know the value written.
>
>   
Note: You can also terminate an eventfd in userspace...it doesn't have
to terminate in-kernel.  Not sure if that helps.



signature.asc
Description: OpenPGP digital signature

Re: [KVM PATCH v8 3/3] KVM: add iosignalfd support

2009-06-23 Thread Michael S. Tsirkin

On Tue, Jun 23, 2009 at 07:33:07AM -0400, Gregory Haskins wrote:
> Michael S. Tsirkin wrote:
> > On Thu, Jun 18, 2009 at 08:30:46PM -0400, Gregory Haskins wrote:
> >   
> >> +static int
> >> +iosignalfd_group_in_range(struct kvm_io_device *this, gpa_t addr, int len,
> >> +int is_write)
> >> +{
> >> +  struct _iosignalfd_group *p = to_group(this);
> >> +
> >> +  return ((addr >= p->addr && (addr < p->addr + p->length)));
> >> +}
> >> 
> >
> > I think I see a problem here. For virtio, we do not necessarily want all
> > virtqueues for a device to live in kernel: there might be control
> > virtqueues that we want to leave in userspace. Since this claims all
> > writes to a specific address, the signal never makes it to userspace.
> >
> >   
> You can use a wildcard.  Would that work?
> 

Nope, you need to know the value written.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] Re: [PATCH 1/2] allow hypervisor CPUID bit to be overriden

2009-06-23 Thread Avi Kivity


On 06/23/2009 02:31 PM, Paul Brook wrote:

On Tuesday 23 June 2009, Avi Kivity wrote:
   

On 06/23/2009 12:47 AM, Andre Przywara wrote:
 

KVM defaults to the hypervisor CPUID bit to be set, whereas pure QEMU
clears it. On some occasions one want to set or clear it the other way
round (for instance to get HyperV running inside a guest).
Allow the default to be overridden on the command line and fix some
whitespace damage on the way.
   

It makes sense for qemu to set the hypervisor bit unconditionally.  A
guest running under qemu is not bare metal.
 


I see no reason why a guest has to be told that it's running inside a VM.
In principle an appropriately configured qemu should be indistinguishable from
real hardware. In practice it's technically infeasible to cover absolutely
everything, but if we set this bit we're not even trying.

I have no objection to the bit being set by default for the QEMU CPU types.
   


I agree it's pointless, but it is a Microsoft requirement for passing 
their SVVP tests.  Enabling it by default makes life a little easier for 
users who wish to validate their hypervisor and has no drawbacks.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v8 3/3] KVM: add iosignalfd support

2009-06-23 Thread Gregory Haskins

Michael S. Tsirkin wrote:
> On Thu, Jun 18, 2009 at 08:30:46PM -0400, Gregory Haskins wrote:
>   
>> +static int
>> +iosignalfd_group_in_range(struct kvm_io_device *this, gpa_t addr, int len,
>> +  int is_write)
>> +{
>> +struct _iosignalfd_group *p = to_group(this);
>> +
>> +return ((addr >= p->addr && (addr < p->addr + p->length)));
>> +}
>> 
>
> I think I see a problem here. For virtio, we do not necessarily want all
> virtqueues for a device to live in kernel: there might be control
> virtqueues that we want to leave in userspace. Since this claims all
> writes to a specific address, the signal never makes it to userspace.
>
>   
You can use a wildcard.  Would that work?



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] Re: [PATCH 1/2] allow hypervisor CPUID bit to be overriden

2009-06-23 Thread Paul Brook

On Tuesday 23 June 2009, Avi Kivity wrote:
> On 06/23/2009 12:47 AM, Andre Przywara wrote:
> > KVM defaults to the hypervisor CPUID bit to be set, whereas pure QEMU
> > clears it. On some occasions one want to set or clear it the other way
> > round (for instance to get HyperV running inside a guest).
> > Allow the default to be overridden on the command line and fix some
> > whitespace damage on the way.
>
> It makes sense for qemu to set the hypervisor bit unconditionally.  A
> guest running under qemu is not bare metal.

I see no reason why a guest has to be told that it's running inside a VM.
In principle an appropriately configured qemu should be indistinguishable from 
real hardware. In practice it's technically infeasible to cover absolutely 
everything, but if we set this bit we're not even trying.

I have no objection to the bit being set by default for the QEMU CPU types.

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v8 3/3] KVM: add iosignalfd support

2009-06-23 Thread Avi Kivity


On 06/23/2009 01:48 PM, Michael S. Tsirkin wrote:

On Tue, Jun 23, 2009 at 12:57:53PM +0300, Avi Kivity wrote:
   

On 06/23/2009 11:56 AM, Michael S. Tsirkin wrote:
 

On Thu, Jun 18, 2009 at 08:30:46PM -0400, Gregory Haskins wrote:

   

+static int
+iosignalfd_group_in_range(struct kvm_io_device *this, gpa_t addr, int len,
+ int is_write)
+{
+   struct _iosignalfd_group *p = to_group(this);
+
+   return ((addr>= p->addr&&   (addr<   p->addr + p->length)));
+}

 

I think I see a problem here. For virtio, we do not necessarily want all
virtqueues for a device to live in kernel: there might be control
virtqueues that we want to leave in userspace. Since this claims all
writes to a specific address, the signal never makes it to userspace.

   

Userspace could create an eventfd for this control queue and wait for it
to fire.
 


What if guest writes an unexpected value there?
The value it simply lost ... that's not very elegant.
   


True, it's better to have a lossless interface.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-2810749 ] linux guest framebuffer fails under vnc

2009-06-23 Thread SourceForge.net

Bugs item #2810749, was opened at 2009-06-23 09:23
Message generated for change (Comment added) made by elfe
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2810749&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Elfe (elfe)
Assigned to: Nobody/Anonymous (nobody)
Summary: linux guest framebuffer fails under vnc

Initial Comment:
When running a linux with framebuffer enabled the vnc server becomes unusable 
and causes the viewer to exit. (reconnecting to the vnc server does not work as 
well)
The machines seems to be still up as it can respond to pings.
booting the gentoo cd with the gentoo-nofb kernel works well
I don't know if it's related but switching a windows guest to a higher 
resolution than 1024x768 caues the vnc client to exit as well. (it can be 
accessed again after waiting the 15 seconds for the display mode confirmation 
timeout)

cpu: 1x Intel(R) Xeon(R) CPU X3320  @ 2.50GHz
kvm version: 0.85 (0.86 is not on gentoo atm)
host kernel: 2.6.29-gentoo-r5
host os/architecture: gentoo x86_64
guest os: gentoo minimal install disc for x86_64, OpenSUSE 11.1 i586
commandline: kvm -drive 
file=/install-amd64-minimal-2008.0-r1.iso,if=ide,media=cdrom -m 768 -vnc :35 
-smp 1 -usbdevice tablet -serial none -parallel none
not helping: -no-kvm-irqchip -no-kvm-pit  -no-kvm

the vncviewer returns errors like:
vncviewer PreferredEncoding=hextile x.x.x.x:35

VNC Viewer Free Edition 4.1.3 for X - built Apr 17 2009 14:25:08
Copyright (C) 2002-2008 RealVNC Ltd.
See http://www.realvnc.com for information on VNC.

Tue Jun 23 09:01:06 2009
 CConn:   connected to host x.x.x.x port 5935
 CConnection: Server supports RFB protocol version 3.8
 CConnection: Using RFB protocol version 3.8
 TXImage: Using default colormap and visual, TrueColor, depth 24.
 CConn:   Using pixel format depth 6 (8bpp) rgb222
 CConn:   Using hextile encoding
Rect too big: 1x0 at 1,1024 exceeds 1024x768

Tue Jun 23 09:01:08 2009
 main:Rect too big


--

>Comment By: Elfe (elfe)
Date: 2009-06-23 13:15

Message:
I was able to work around the issue on OpenSUSE by changing the VESA
800x600 at 16bit to 800x600 with 24 bit color depth by using SDL for
access.
But the general issue with vnc is still there and should be addressed.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2810749&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2][RFC] Kernel changes for HPET legacy mode (v7)

2009-06-23 Thread Jan Kiszka

Beth Kon wrote:
> Avi Kivity wrote:
>> On 06/22/2009 12:14 PM, Jan Kiszka wrote:
> Hmm, stead of introducing a new pair of singe-purpose IOCTLs, why not
> add KVM_GET/SET_PIT2 which exchanges an extended kvm_pit_state2. And
> that struct should also include some flags field and enough padding to
> be potentially extended yet again in the future. In that case I see no
> problem having also a mode read-back interface.
>
>
 We'd only add kernel hpet if we were forced to (I imagine the same
 applications/kernels that forced the PIT into the kernel will do the
 same for HPET).

  
>>>
>>> Answer and citation does not yet correlate for me.
>>>
>>
>> Misquote.  I meant to reply to your 'Is it planned to add in-kernel
>> hpet support?' question.  Must be early in the morning in some timezone.
>>
>>> Could you comment more explicitly if your are fine with Beth's proposed
>>> interface, rather prefer something like my suggestion or even want
>>> something totally different?
>>>
>>
>> GET/SET PIT2 looks like the best choice to me, at least until I find
>> whoever designed the HPET/PIT interdependency and make him take it back.
>>
> It seems to me that GET/SET PIT2 adds a good deal of complexity without
> any gain. PIT is not a very dynamic technology. GET/SET PIT works for
> PIT operational needs as defined by the PIT specifications. This whole
> problem existe because of the unfortunate requirement that hpet disable
> PIT interrupts, which is quite outside normal PIT operation. If I create
> a GET/SET PIT2, and a PITState2 that is a superset of PITState1, then I
> need to address all the cases where PITState is currently set/referenced
> and properly use PITState2/PITState, depending on whether the kernel
> supports PITState2. It just seems unnecessary, since hpet_legacy, and
> probably any other future "control" of the PIT will likely be outside of
> "normal" PIT operation.  I really think a separate ioctl that transfers
> just control information (of which one of the flag bits would be
> hpet_legacy) would be preferable and cleaner. Am I missing some
> advantage to PITState2? Or is there some simple way to implement this
> that I'm missing?

I think you just need to refactor out the processing of
kvm_pit_channel_state and call it from the legacy get/set handler as
well as from the new one. And the new handlers will also process the
additional fields that will come with kvm_pit_state2. Not really more
complex than the current proposal.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v8 3/3] KVM: add iosignalfd support

2009-06-23 Thread Michael S. Tsirkin

On Tue, Jun 23, 2009 at 12:57:53PM +0300, Avi Kivity wrote:
> On 06/23/2009 11:56 AM, Michael S. Tsirkin wrote:
>> On Thu, Jun 18, 2009 at 08:30:46PM -0400, Gregory Haskins wrote:
>>
>>> +static int
>>> +iosignalfd_group_in_range(struct kvm_io_device *this, gpa_t addr, int len,
>>> + int is_write)
>>> +{
>>> +   struct _iosignalfd_group *p = to_group(this);
>>> +
>>> +   return ((addr>= p->addr&&  (addr<  p->addr + p->length)));
>>> +}
>>>  
>>
>> I think I see a problem here. For virtio, we do not necessarily want all
>> virtqueues for a device to live in kernel: there might be control
>> virtqueues that we want to leave in userspace. Since this claims all
>> writes to a specific address, the signal never makes it to userspace.
>>
>
> Userspace could create an eventfd for this control queue and wait for it  
> to fire.

What if guest writes an unexpected value there?
The value it simply lost ... that's not very elegant.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] introduce -cpu host target

2009-06-23 Thread Avi Kivity


On 06/23/2009 12:47 AM, Andre Przywara wrote:

Although the guest's CPUID bits can be controlled in a fine grained way
in QEMU, a simple way to inject the host CPU is missing. This is handy
for KVM desktop virtualization, where one wants the guest to support the
full host feature set.
Introduce another CPU type called 'host', which will propagate the host's
CPUID bits to the guest. Problematic bits can still be turned off by using
the existing syntax (-cpu host,-skinit)
   


kvm already knows how to filter unknown bits to prevent runtime 
failures.  This is even more important for qemu/tcg, since even simple 
bits which only define new instructions need explicit support in qemu.  
I think -cpu host should default to filtering unsupported bits instead 
of hoping the guest will ignore them or expecting the user to know which 
bits to remove.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] allow hypervisor CPUID bit to be overriden

2009-06-23 Thread Avi Kivity


On 06/23/2009 12:47 AM, Andre Przywara wrote:

KVM defaults to the hypervisor CPUID bit to be set, whereas pure QEMU clears
it. On some occasions one want to set or clear it the other way round (for
instance to get HyperV running inside a guest).
Allow the default to be overridden on the command line and fix some
whitespace damage on the way.
   


It makes sense for qemu to set the hypervisor bit unconditionally.  A 
guest running under qemu is not bare metal.


That's a separate change though.  Patch looks good.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v8 3/3] KVM: add iosignalfd support

2009-06-23 Thread Avi Kivity


On 06/23/2009 11:56 AM, Michael S. Tsirkin wrote:

On Thu, Jun 18, 2009 at 08:30:46PM -0400, Gregory Haskins wrote:
   

+static int
+iosignalfd_group_in_range(struct kvm_io_device *this, gpa_t addr, int len,
+ int is_write)
+{
+   struct _iosignalfd_group *p = to_group(this);
+
+   return ((addr>= p->addr&&  (addr<  p->addr + p->length)));
+}
 


I think I see a problem here. For virtio, we do not necessarily want all
virtqueues for a device to live in kernel: there might be control
virtqueues that we want to leave in userspace. Since this claims all
writes to a specific address, the signal never makes it to userspace.
   


Userspace could create an eventfd for this control queue and wait for it 
to fire.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] pass write value to in_range pointers

2009-06-23 Thread Avi Kivity


On 06/22/2009 07:29 PM, Gregory Haskins wrote:

We actually already have aliasing: is_write flag is used for this
purpose.
 


Yes, but read/write address aliasing is not the same thing is
multi-match data aliasing.  Besides, your proposal also breaks some of
the natural relationship models (e.g. all the aliased iosignal_items
always belong to the same underlying device.  io_bus entries have an
arbitrary topology).
   


It's all one big hack, we want to get the correct function called with 
as little fuss as possible.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] pass write value to in_range pointers

2009-06-23 Thread Avi Kivity


On 06/22/2009 07:08 PM, Michael S. Tsirkin wrote:

On Mon, Jun 22, 2009 at 11:45:00AM -0400, Gregory Haskins wrote:
   

Michael S. Tsirkin wrote:
 

It seems that a lot of complexity and trickiness with iosignalfd is
handling the group/item relationship, which comes about because kvm does
not currently let a device on the bus claim a write transaction based on the
value written.  This could be greatly simplified if the value written
was passed to the in_range check for write operation.  We could then
simply make each kvm_iosignalfd a device on the bus.

What does everyone think of the following lightly tested patch?

   

Hi Michael,
   Its interesting, but I am not convinced its necessary.  We created the
group/item layout because iosignalfds are unique in that they are
probably the only IO device that wants to do some kind of address
aliasing.
 


We actually already have aliasing: is_write flag is used for this
purpose. Actually, it's possible to remove is_write by passing
a null pointer in write_val for reads. I like this a bit less as
the code generated is less compact ... Avi, what do you think?
   


Greg, won't Michael's patch eliminate a big chunk from your iosignalfd 
patches?  Seems like a win to me.



One is enough :)
Seriously, do you see that this saves you all of RCU, linked lists and
counters? You don't need to keep track of iofds, you don't need to
implement your own lookup logic - you just use the kvm device
and that's it.

   


Yup.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v8 3/3] KVM: add iosignalfd support

2009-06-23 Thread Michael S. Tsirkin

On Thu, Jun 18, 2009 at 08:30:46PM -0400, Gregory Haskins wrote:
> +static int
> +iosignalfd_group_in_range(struct kvm_io_device *this, gpa_t addr, int len,
> +   int is_write)
> +{
> + struct _iosignalfd_group *p = to_group(this);
> +
> + return ((addr >= p->addr && (addr < p->addr + p->length)));
> +}

I think I see a problem here. For virtio, we do not necessarily want all
virtqueues for a device to live in kernel: there might be control
virtqueues that we want to leave in userspace. Since this claims all
writes to a specific address, the signal never makes it to userspace.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 103 matches

Mail list logo