Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-04-08 Thread Antoine Martin
Avi Kivity wrote:
> On 03/24/2010 06:40 PM, Joerg Roedel wrote:
>>
>>> Looks trivial to find a guest, less so with enumerating (still doable).
>>>  
>> Not so trival and even more likely to break. Even it perf has the pid of
>> the process and wants to find the directory it has to do:
>>
>> 1. Get the uid of the process
>> 2. Find the username for the uid
>> 3. Use the username to find the home-directory
>>
>> Steps 2. and 3. need nsswitch and/or pam access to get this information
>> from whatever source the admin has configured. And depending on what the
>> source is it may be temporarily unavailable causing nasty timeouts. In
>> short, there are many weak parts in that chain making it more likely to
>> break.
>>
> 
> It's true.  If the kernel provides something, there are fewer things
> that can break.  But if your system is so broken that you can't resolve
> uids, fix that before running perf.  Must we design perf for that case?
uid to username can fail when using chroots, or worse point to an
incorrect location (and yes, I do use this)

Sorry if this has been covered / discussion has moved on. Just catching
up with the 500+ messages in my inbox..

Antoine


> 
> After all, 'ls -l' will break under the same circumstances.  It's hard
> to imagine doing useful work when that doesn't work.
> 
>> A kernel-based approach with /proc//kvm does not have those issues
>> (and to repeat myself, it is independent from the userspace being used).
>>
> 
> It has other issues, which are IMO more problematic.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-25 Thread Zhang, Yanmin
On Wed, 2010-03-24 at 20:20 +0200, Avi Kivity wrote:
> On 03/24/2010 07:47 PM, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Mar 24, 2010 at 06:09:30PM +0200, Avi Kivity escreveu:
> >
> >> Doesn't perf already has a dependency on naming conventions for finding
> >> debug information?
> >>  
> > It looks at several places, from most symbol rich (/usr/lib/debug/, aka
> > -debuginfo packages, where we have full symtabs) to poorest (the
> > packaged binary, where we may just have a .dynsym).
> >
> > In an ideal world, it would just get the build-id (a SHA1 cookie that is
> > in an ELF session inserted in every binary (aka DSOs), kernel module,
> > kallsyms or vmlinux file) and use that to look first in a local cache
> > (implemented in perf for a long time already) or in some symbol server.
> >
> > For instance, for a random perf.data file I collected here in my machine
> > I have:
> >
> > [a...@doppio linux-2.6-tip]$ perf buildid-list | grep libpthread
> > 5c68f7afeb33309c78037e374b0deee84dd441f6 /lib64/libpthread-2.10.2.so
> > [a...@doppio linux-2.6-tip]$
> >
> > So I don't have to access /lib64/libpthread-2.10.2.so directly, nor some
> > convention to get a debuginfo in a local file like:
> >
> > /usr/lib/debug/lib64/libpthread-2.10.2.so.debug
> >
> > Instead the tools look at:
> >
> > [a...@doppio linux-2.6-tip]$ l 
> > ~/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6
> > lrwxrwxrwx 1 acme acme 73 2010-01-06 18:53 
> > /home/acme/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6 ->  
> > ../../lib64/libpthread-2.10.2.so/5c68f7afeb33309c78037e374b0deee84dd441f6*
> >
> > To find the file for that specific build-id, not the one installed in my
> > machine (or on the different machine, of a different architecture) that
> > may be completely unrelated, a new one, or one for a different arch.
> >
> 
> Thanks.  I believe qemu could easily act as a symbol server for this use 
> case.


I spent a couple of days to investigate why sshfs/fuse doesn't work well with
procfs and sysfs. Just after my patch against fuse is ready almost, I found
fuse already supports such access by direct I/O. With parameter -o direct_io,
it could work well.

Here is an example to mount / from a guest os.
#sshfs -p 5551 -o direct_io localhost:/ guestmount

We can read files and write files if permission is ok.

I will go ahead to support multiple guest os instance statistics parsing.

Yanmin


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 24, 2010 at 08:20:10PM +0200, Avi Kivity escreveu:
> On 03/24/2010 07:47 PM, Arnaldo Carvalho de Melo wrote:
>> Em Wed, Mar 24, 2010 at 06:09:30PM +0200, Avi Kivity escreveu:
>>
>>> Doesn't perf already has a dependency on naming conventions for finding
>>> debug information?
>>>  
>> It looks at several places, from most symbol rich (/usr/lib/debug/, aka
>> -debuginfo packages, where we have full symtabs) to poorest (the
>> packaged binary, where we may just have a .dynsym).
>>
>> In an ideal world, it would just get the build-id (a SHA1 cookie that is
>> in an ELF session inserted in every binary (aka DSOs), kernel module,
>> kallsyms or vmlinux file) and use that to look first in a local cache
>> (implemented in perf for a long time already) or in some symbol server.
>>
>> For instance, for a random perf.data file I collected here in my machine
>> I have:
>>
>> [a...@doppio linux-2.6-tip]$ perf buildid-list | grep libpthread
>> 5c68f7afeb33309c78037e374b0deee84dd441f6 /lib64/libpthread-2.10.2.so
>> [a...@doppio linux-2.6-tip]$
>>
>> So I don't have to access /lib64/libpthread-2.10.2.so directly, nor some
>> convention to get a debuginfo in a local file like:
>>
>> /usr/lib/debug/lib64/libpthread-2.10.2.so.debug
>>
>> Instead the tools look at:
>>
>> [a...@doppio linux-2.6-tip]$ l 
>> ~/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6
>> lrwxrwxrwx 1 acme acme 73 2010-01-06 18:53 
>> /home/acme/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6 ->  
>> ../../lib64/libpthread-2.10.2.so/5c68f7afeb33309c78037e374b0deee84dd441f6*
>>
>> To find the file for that specific build-id, not the one installed in my
>> machine (or on the different machine, of a different architecture) that
>> may be completely unrelated, a new one, or one for a different arch.

> Thanks.  I believe qemu could easily act as a symbol server for this use  
> case.

Agreed, but it doesn't even have to :-)

We just need to get the build-id in the PERF_RECORD_MMAP event somehow
and then get this symbol from elsewhere, say the same DVD/RHN
channel/Debian Repository/embedded developer toolkit image not
stripped/whatever.

Or it may already be in the local cache from last week's perf report
session :-)

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 07:47 PM, Arnaldo Carvalho de Melo wrote:

Em Wed, Mar 24, 2010 at 06:09:30PM +0200, Avi Kivity escreveu:
   

Doesn't perf already has a dependency on naming conventions for finding
debug information?
 

It looks at several places, from most symbol rich (/usr/lib/debug/, aka
-debuginfo packages, where we have full symtabs) to poorest (the
packaged binary, where we may just have a .dynsym).

In an ideal world, it would just get the build-id (a SHA1 cookie that is
in an ELF session inserted in every binary (aka DSOs), kernel module,
kallsyms or vmlinux file) and use that to look first in a local cache
(implemented in perf for a long time already) or in some symbol server.

For instance, for a random perf.data file I collected here in my machine
I have:

[a...@doppio linux-2.6-tip]$ perf buildid-list | grep libpthread
5c68f7afeb33309c78037e374b0deee84dd441f6 /lib64/libpthread-2.10.2.so
[a...@doppio linux-2.6-tip]$

So I don't have to access /lib64/libpthread-2.10.2.so directly, nor some
convention to get a debuginfo in a local file like:

/usr/lib/debug/lib64/libpthread-2.10.2.so.debug

Instead the tools look at:

[a...@doppio linux-2.6-tip]$ l 
~/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6
lrwxrwxrwx 1 acme acme 73 2010-01-06 18:53 
/home/acme/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6 ->  
../../lib64/libpthread-2.10.2.so/5c68f7afeb33309c78037e374b0deee84dd441f6*

To find the file for that specific build-id, not the one installed in my
machine (or on the different machine, of a different architecture) that
may be completely unrelated, a new one, or one for a different arch.
   


Thanks.  I believe qemu could easily act as a symbol server for this use 
case.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Arnaldo Carvalho de Melo
Em Wed, Mar 24, 2010 at 06:09:30PM +0200, Avi Kivity escreveu:
> Doesn't perf already has a dependency on naming conventions for finding  
> debug information?

It looks at several places, from most symbol rich (/usr/lib/debug/, aka
-debuginfo packages, where we have full symtabs) to poorest (the
packaged binary, where we may just have a .dynsym).

In an ideal world, it would just get the build-id (a SHA1 cookie that is
in an ELF session inserted in every binary (aka DSOs), kernel module,
kallsyms or vmlinux file) and use that to look first in a local cache
(implemented in perf for a long time already) or in some symbol server.

For instance, for a random perf.data file I collected here in my machine
I have:

[a...@doppio linux-2.6-tip]$ perf buildid-list | grep libpthread
5c68f7afeb33309c78037e374b0deee84dd441f6 /lib64/libpthread-2.10.2.so
[a...@doppio linux-2.6-tip]$

So I don't have to access /lib64/libpthread-2.10.2.so directly, nor some
convention to get a debuginfo in a local file like:

/usr/lib/debug/lib64/libpthread-2.10.2.so.debug

Instead the tools look at:

[a...@doppio linux-2.6-tip]$ l 
~/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6
lrwxrwxrwx 1 acme acme 73 2010-01-06 18:53 
/home/acme/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6 -> 
../../lib64/libpthread-2.10.2.so/5c68f7afeb33309c78037e374b0deee84dd441f6*

To find the file for that specific build-id, not the one installed in my
machine (or on the different machine, of a different architecture) that
may be completely unrelated, a new one, or one for a different arch.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 06:47 PM, Avi Kivity wrote:


It's true.  If the kernel provides something, there are fewer things 
that can break.  But if your system is so broken that you can't 
resolve uids, fix that before running perf.  Must we design perf for 
that case?


After all, 'ls -l' will break under the same circumstances.  It's hard 
to imagine doing useful work when that doesn't work.



Also, perf itself will hang if it needs to access a file using autofs or 
nfs, and those are broken.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 06:45 PM, Joerg Roedel wrote:



That's just what I want to do.  Leave it in userspace and then they can
deal with it without telling us about it.
 

They can't do that with a directory in /proc?

   


I don't know.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 06:40 PM, Joerg Roedel wrote:



Looks trivial to find a guest, less so with enumerating (still doable).
 

Not so trival and even more likely to break. Even it perf has the pid of
the process and wants to find the directory it has to do:

1. Get the uid of the process
2. Find the username for the uid
3. Use the username to find the home-directory

Steps 2. and 3. need nsswitch and/or pam access to get this information
from whatever source the admin has configured. And depending on what the
source is it may be temporarily unavailable causing nasty timeouts. In
short, there are many weak parts in that chain making it more likely to
break.
   


It's true.  If the kernel provides something, there are fewer things 
that can break.  But if your system is so broken that you can't resolve 
uids, fix that before running perf.  Must we design perf for that case?


After all, 'ls -l' will break under the same circumstances.  It's hard 
to imagine doing useful work when that doesn't work.



A kernel-based approach with /proc//kvm does not have those issues
(and to repeat myself, it is independent from the userspace being used).
   


It has other issues, which are IMO more problematic.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Peter Zijlstra
On Wed, 2010-03-24 at 17:23 +0100, Joerg Roedel wrote:
> On Wed, Mar 24, 2010 at 05:03:42PM +0100, Peter Zijlstra wrote:
> > On Wed, 2010-03-24 at 16:01 +0100, Joerg Roedel wrote:
> > 
> > > What I meant was: perf-kernel puts the guest-name into every sample and
> > > perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
> > > symbols. I leave the question of how the guest-fs is exposed to the host
> > > out of this discussion. We should discuss this seperatly.
> > 
> > I'd much prefer a pid like suggested later, keeps the samples smaller.
> > 
> > But that said, we need guest kernel events like mmap and context
> > switches too, otherwise we simply can't make sense of guest userspace
> > addresses, we need to know the guest address space layout.
> 
> With the filesystem approach all we need is the pid of the guest
> process. Then we can access proc//maps of the guest and read out the
> address space layout, no?

No, what if it maps new things after you read it? But still getting the
pid of the guest process seems non trivial without guest kernel support.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 06:32:51PM +0200, Avi Kivity wrote:
> On 03/24/2010 06:31 PM, Joerg Roedel wrote:

> That's just what I want to do.  Leave it in userspace and then they can  
> deal with it without telling us about it.

They can't do that with a directory in /proc?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 06:09:30PM +0200, Avi Kivity wrote:
> On 03/24/2010 05:59 PM, Joerg Roedel wrote:
>>
>>
 I am not tied to /sys/kvm. We could also use /proc//kvm/ for
 example. This would keep anything in the process space (except for the
 global list of VMs which we should have anyway).


>>> How about ~/.qemu/guests/$pid?
>>>  
>> That makes it hard for perf to find it and even harder to get a list of
>> all VMs.
>
> Looks trivial to find a guest, less so with enumerating (still doable).

Not so trival and even more likely to break. Even it perf has the pid of
the process and wants to find the directory it has to do:

1. Get the uid of the process
2. Find the username for the uid
3. Use the username to find the home-directory

Steps 2. and 3. need nsswitch and/or pam access to get this information
from whatever source the admin has configured. And depending on what the
source is it may be temporarily unavailable causing nasty timeouts. In
short, there are many weak parts in that chain making it more likely to
break.
A kernel-based approach with /proc//kvm does not have those issues
(and to repeat myself, it is independent from the userspace being used).

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 06:31 PM, Joerg Roedel wrote:

On Wed, Mar 24, 2010 at 06:20:38PM +0200, Avi Kivity wrote:
   

On 03/24/2010 06:17 PM, Joerg Roedel wrote:
 

But is this not only one entity more for
sVirt to handle? I would leave that decision to the sVirt developers.
Does attaching the same label as for the VM resources mean that root
could not access it anymore?

   

IIUC processes run under a context, and there's a policy somewhere that
tells you which context can access which label (and with what
permissions).  There was a server on the Internet once that gave you
root access and invited you to attack it.  No idea if anyone succeeded
or not (I got bored after about a minute).

So it depends on the policy.  If you attach the same label, that means
all files with the same label have the same access permissions.  I think.
 

So if this is true we can introduce a 'trace' label and add all contexts
that should be allowed to trace to it.
But we probably should leave the details to the security experts ;-)
   


That's just what I want to do.  Leave it in userspace and then they can 
deal with it without telling us about it.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 06:20:38PM +0200, Avi Kivity wrote:
> On 03/24/2010 06:17 PM, Joerg Roedel wrote:
>> But is this not only one entity more for
>> sVirt to handle? I would leave that decision to the sVirt developers.
>> Does attaching the same label as for the VM resources mean that root
>> could not access it anymore?
>>
>
> IIUC processes run under a context, and there's a policy somewhere that  
> tells you which context can access which label (and with what  
> permissions).  There was a server on the Internet once that gave you  
> root access and invited you to attack it.  No idea if anyone succeeded  
> or not (I got bored after about a minute).
>
> So it depends on the policy.  If you attach the same label, that means  
> all files with the same label have the same access permissions.  I think.

So if this is true we can introduce a 'trace' label and add all contexts
that should be allowed to trace to it.
But we probably should leave the details to the security experts ;-)

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 05:03:42PM +0100, Peter Zijlstra wrote:
> On Wed, 2010-03-24 at 16:01 +0100, Joerg Roedel wrote:
> 
> > What I meant was: perf-kernel puts the guest-name into every sample and
> > perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
> > symbols. I leave the question of how the guest-fs is exposed to the host
> > out of this discussion. We should discuss this seperatly.
> 
> I'd much prefer a pid like suggested later, keeps the samples smaller.
> 
> But that said, we need guest kernel events like mmap and context
> switches too, otherwise we simply can't make sense of guest userspace
> addresses, we need to know the guest address space layout.

With the filesystem approach all we need is the pid of the guest
process. Then we can access proc//maps of the guest and read out the
address space layout, no?

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 06:17 PM, Joerg Roedel wrote:

On Wed, Mar 24, 2010 at 05:52:54PM +0200, Avi Kivity wrote:
   

On 03/24/2010 05:50 PM, Joerg Roedel wrote:
 

If we go the /proc//kvm way then the directory should probably
inherit the label from /proc//?
   

That's a security policy.  The security people like their policies
outside the kernel.

For example, they may want a label that allows a trace context to read
the data, and also qemu itself for introspection.
 

Hm, I am not a security expert.


I'm out of my depth here as well.


But is this not only one entity more for
sVirt to handle? I would leave that decision to the sVirt developers.
Does attaching the same label as for the VM resources mean that root
could not access it anymore?
   


IIUC processes run under a context, and there's a policy somewhere that 
tells you which context can access which label (and with what 
permissions).  There was a server on the Internet once that gave you 
root access and invited you to attack it.  No idea if anyone succeeded 
or not (I got bored after about a minute).


So it depends on the policy.  If you attach the same label, that means 
all files with the same label have the same access permissions.  I think.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 06:03 PM, Peter Zijlstra wrote:

On Wed, 2010-03-24 at 16:01 +0100, Joerg Roedel wrote:

   

What I meant was: perf-kernel puts the guest-name into every sample and
perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
symbols. I leave the question of how the guest-fs is exposed to the host
out of this discussion. We should discuss this seperatly.
 

I'd much prefer a pid like suggested later, keeps the samples smaller.

But that said, we need guest kernel events like mmap and context
switches too, otherwise we simply can't make sense of guest userspace
addresses, we need to know the guest address space layout.
   


The kernel knows some of the address space layout, qemu knows all of it.


So aside from a filesystem content, we first need mmap and context
switch events to find the files we need to access.
   


This only works for the guest kernel, we don't know anything about guest 
processes [1].



And while I appreciate all the security talk, its basically pointless
anyway, the host can access it anyway, everybody agrees on that, but
still you're arguing the case..
   


root can access anything, but we're not talking about root.  The idea is 
to protect against a guest that has exploited its qemu and is now 
attacking the host and its fellow guests.   uid protection is no good 
since we want to isolate the guest from host processes belonging to the 
same uid and from other guests running under the same uid.


[1] We can find out guest pids if we teach the kernel what to 
dereference, i.e. gs:offset1->offset2->offset3.  Of course this varies 
from kernel to kernel, so we need some kind of bytecode that we can run 
in perf nmi context.  Kind of what we need to run an unwinder for 
-fomit-frame-pointer.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 05:52:54PM +0200, Avi Kivity wrote:
> On 03/24/2010 05:50 PM, Joerg Roedel wrote:
>> If we go the /proc//kvm way then the directory should probably
>> inherit the label from /proc//?
>
> That's a security policy.  The security people like their policies  
> outside the kernel.
>
> For example, they may want a label that allows a trace context to read  
> the data, and also qemu itself for introspection.

Hm, I am not a security expert. But is this not only one entity more for
sVirt to handle? I would leave that decision to the sVirt developers.
Does attaching the same label as for the VM resources mean that root
could not access it anymore?

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 05:59 PM, Joerg Roedel wrote:


   

I am not tied to /sys/kvm. We could also use /proc//kvm/ for
example. This would keep anything in the process space (except for the
global list of VMs which we should have anyway).

   

How about ~/.qemu/guests/$pid?
 

That makes it hard for perf to find it and even harder to get a list of
all VMs.


Looks trivial to find a guest, less so with enumerating (still doable).


  With /proc//kvm/guest we could symlink all guest
directories to /proc/kvm/ and perf reads the list from there. Also perf
can easily derive the directory for a guest from its pid.
Last but not least its kernel-created and thus independent from the
userspace part being used.
   


Doesn't perf already has a dependency on naming conventions for finding 
debug information?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Peter Zijlstra
On Wed, 2010-03-24 at 16:01 +0100, Joerg Roedel wrote:

> What I meant was: perf-kernel puts the guest-name into every sample and
> perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
> symbols. I leave the question of how the guest-fs is exposed to the host
> out of this discussion. We should discuss this seperatly.

I'd much prefer a pid like suggested later, keeps the samples smaller.

But that said, we need guest kernel events like mmap and context
switches too, otherwise we simply can't make sense of guest userspace
addresses, we need to know the guest address space layout.

So aside from a filesystem content, we first need mmap and context
switch events to find the files we need to access.

And while I appreciate all the security talk, its basically pointless
anyway, the host can access it anyway, everybody agrees on that, but
still you're arguing the case..
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 05:49:42PM +0200, Avi Kivity wrote:
> On 03/24/2010 05:46 PM, Joerg Roedel wrote:
>> On Wed, Mar 24, 2010 at 05:12:55PM +0200, Avi Kivity wrote:
>>
>>> On 03/24/2010 05:01 PM, Joerg Roedel wrote:
>>>  
 $ cd /sys/kvm/guest0
 $ ls -l
 -r 1 root root 0 2009-08-17 12:05 name
 dr-x-- 1 root root 0 2009-08-17 12:05 fs
 $ cat name
 guest0
 $ # ...

 The fs/ directory is used as the mount point for the guest root fs.

>>> The problem is /sys/kvm, not /sys/kvm/fs.
>>>  
>> I am not tied to /sys/kvm. We could also use /proc//kvm/ for
>> example. This would keep anything in the process space (except for the
>> global list of VMs which we should have anyway).
>>
>
> How about ~/.qemu/guests/$pid?

That makes it hard for perf to find it and even harder to get a list of
all VMs. With /proc//kvm/guest we could symlink all guest
directories to /proc/kvm/ and perf reads the list from there. Also perf
can easily derive the directory for a guest from its pid.
Last but not least its kernel-created and thus independent from the
userspace part being used.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 05:50 PM, Joerg Roedel wrote:

On Wed, Mar 24, 2010 at 05:43:31PM +0200, Avi Kivity wrote:
   

On 03/24/2010 05:37 PM, Joerg Roedel wrote:
 

Even better. So a guest which breaks out can't even access its own
/sys/kvm/ directory. Perfect, it doesn't need that access anyway.
   

But what security label does that directory have?  How can we make sure
that whoever needs access to those files, gets them?

Automatically created objects don't work well with that model.  They're
simply missing information.
 

If we go the /proc//kvm way then the directory should probably
inherit the label from /proc//?
   


That's a security policy.  The security people like their policies 
outside the kernel.


For example, they may want a label that allows a trace context to read 
the data, and also qemu itself for introspection.



Same could be applied to /sys/kvm/guest/ if we decide for it. The VM is
still bound to a single process with a /proc/  after all.
   


Ditto.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 05:43:31PM +0200, Avi Kivity wrote:
> On 03/24/2010 05:37 PM, Joerg Roedel wrote:
>> Even better. So a guest which breaks out can't even access its own
>> /sys/kvm/ directory. Perfect, it doesn't need that access anyway.
>
> But what security label does that directory have?  How can we make sure  
> that whoever needs access to those files, gets them?
>
> Automatically created objects don't work well with that model.  They're  
> simply missing information.

If we go the /proc//kvm way then the directory should probably
inherit the label from /proc//?
Same could be applied to /sys/kvm/guest/ if we decide for it. The VM is
still bound to a single process with a /proc/ after all.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 05:46 PM, Joerg Roedel wrote:

On Wed, Mar 24, 2010 at 05:12:55PM +0200, Avi Kivity wrote:
   

On 03/24/2010 05:01 PM, Joerg Roedel wrote:
 

$ cd /sys/kvm/guest0
$ ls -l
-r 1 root root 0 2009-08-17 12:05 name
dr-x-- 1 root root 0 2009-08-17 12:05 fs
$ cat name
guest0
$ # ...

The fs/ directory is used as the mount point for the guest root fs.
   

The problem is /sys/kvm, not /sys/kvm/fs.
 

I am not tied to /sys/kvm. We could also use /proc//kvm/ for
example. This would keep anything in the process space (except for the
global list of VMs which we should have anyway).
   


How about ~/.qemu/guests/$pid?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 05:12:55PM +0200, Avi Kivity wrote:
> On 03/24/2010 05:01 PM, Joerg Roedel wrote:
>> $ cd /sys/kvm/guest0
>> $ ls -l
>> -r 1 root root 0 2009-08-17 12:05 name
>> dr-x-- 1 root root 0 2009-08-17 12:05 fs
>> $ cat name
>> guest0
>> $ # ...
>>
>> The fs/ directory is used as the mount point for the guest root fs.
>
> The problem is /sys/kvm, not /sys/kvm/fs.

I am not tied to /sys/kvm. We could also use /proc//kvm/ for
example. This would keep anything in the process space (except for the
global list of VMs which we should have anyway).

>> What I meant was: perf-kernel puts the guest-name into every sample and
>> perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
>> symbols. I leave the question of how the guest-fs is exposed to the host
>> out of this discussion. We should discuss this seperatly.
>
> How I see it: perf-kernel puts the guest pid into every sample, and  
> perf-userspace uses that to resolve to a mountpoint served by fuse, or  
> to a unix domain socket that serves the files.

We need a bit more information than just the qemu-pid, but yes, this
would also work out.

>> If a vm breaks into qemu it can access the host file system which is the
>> bigger problem. In this case there is no isolation anymore. From that
>> context it can even kill other VMs of the same user independent of a
>> hypothetical /sys/kvm/.
>
> It cannot.  sVirt labels the disk image and other files qemu needs with  
> the appropriate label, and everything else is off limits.  Even if you  
> run the guest as root, it won't have access to other files.

See my reply to Daniel's email.

>> Yes, but its different from the implementation point-of-view. For the
>> user it surely all plays together.
>
> We need qemu to cooperate for mmio tracing, and we can cooperate with  
> qemu for symbol resolution.  If it prevents adding another kernel API,  
> that's a win from my POV.

Thats true. Probably qemu can inject this information in the
kvm-trace-events stream.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 05:37 PM, Joerg Roedel wrote:



No it can't. With sVirt every single VM has a custom security label and
the policy only allows it access to disks / files with a matching label,
and prevents it attacking any other VMs or processes on the host. THis
confines the scope of any exploit in QEMU to those resources the admin
has explicitly assigned to the guest.
 

Even better. So a guest which breaks out can't even access its own
/sys/kvm/ directory. Perfect, it doesn't need that access anyway.

   


But what security label does that directory have?  How can we make sure 
that whoever needs access to those files, gets them?


Automatically created objects don't work well with that model.  They're 
simply missing information.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 03:26:53PM +, Daniel P. Berrange wrote:
> On Wed, Mar 24, 2010 at 04:01:37PM +0100, Joerg Roedel wrote:
> > >> An approach like: "The files are owned and only readable by the same
> > >> user that started the vm." might be a good start. So a user can measure
> > >> its own guests and root can measure all of them.
> > >
> > > That's not how sVirt works.  sVirt isolates a user's VMs from each  
> > > other, so if a guest breaks into qemu it can't break into other guests  
> > > owned by the same user.
> > 
> > If a vm breaks into qemu it can access the host file system which is the
> > bigger problem. In this case there is no isolation anymore. From that
> > context it can even kill other VMs of the same user independent of a
> > hypothetical /sys/kvm/.
> 
> No it can't. With sVirt every single VM has a custom security label and
> the policy only allows it access to disks / files with a matching label,
> and prevents it attacking any other VMs or processes on the host. THis
> confines the scope of any exploit in QEMU to those resources the admin
> has explicitly assigned to the guest.

Even better. So a guest which breaks out can't even access its own
/sys/kvm/ directory. Perfect, it doesn't need that access anyway.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Daniel P. Berrange
On Wed, Mar 24, 2010 at 04:01:37PM +0100, Joerg Roedel wrote:
> >> An approach like: "The files are owned and only readable by the same
> >> user that started the vm." might be a good start. So a user can measure
> >> its own guests and root can measure all of them.
> >
> > That's not how sVirt works.  sVirt isolates a user's VMs from each  
> > other, so if a guest breaks into qemu it can't break into other guests  
> > owned by the same user.
> 
> If a vm breaks into qemu it can access the host file system which is the
> bigger problem. In this case there is no isolation anymore. From that
> context it can even kill other VMs of the same user independent of a
> hypothetical /sys/kvm/.

No it can't. With sVirt every single VM has a custom security label and
the policy only allows it access to disks / files with a matching label,
and prevents it attacking any other VMs or processes on the host. THis
confines the scope of any exploit in QEMU to those resources the admin
has explicitly assigned to the guest.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 05:01 PM, Joerg Roedel wrote:



But when I weigh the benefit of truly transparent  system-wide perf
integration for users who don't use libvirt but do use  perf, versus
the cost of transforming kvm from a single-process API to a
system-wide API with all the complications that I've listed, it comes
out in favour of not adding the API.
 

Its not a transformation, its an extension. The current per-process
/dev/kvm stays mostly untouched. Its all about having something like
this:

$ cd /sys/kvm/guest0
$ ls -l
-r 1 root root 0 2009-08-17 12:05 name
dr-x-- 1 root root 0 2009-08-17 12:05 fs
$ cat name
guest0
$ # ...

The fs/ directory is used as the mount point for the guest root fs.
   


The problem is /sys/kvm, not /sys/kvm/fs.


The samples will be tagged with the guest-name (and some additional
information perf needs). Perf userspace can access the symbols then
through /sys/kvm/guest0/fs/...
   

I take that as a yes?  So we need a virtio-serial client in the kernel
(which might be exploitable by a malicious guest if buggy) and a
fs-over-virtio-serial client in the kernel (also exploitable).
 

What I meant was: perf-kernel puts the guest-name into every sample and
perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
symbols. I leave the question of how the guest-fs is exposed to the host
out of this discussion. We should discuss this seperatly.
   


How I see it: perf-kernel puts the guest pid into every sample, and 
perf-userspace uses that to resolve to a mountpoint served by fuse, or 
to a unix domain socket that serves the files.



An approach like: "The files are owned and only readable by the same
user that started the vm." might be a good start. So a user can measure
its own guests and root can measure all of them.
   

That's not how sVirt works.  sVirt isolates a user's VMs from each
other, so if a guest breaks into qemu it can't break into other guests
owned by the same user.
 

If a vm breaks into qemu it can access the host file system which is the
bigger problem. In this case there is no isolation anymore. From that
context it can even kill other VMs of the same user independent of a
hypothetical /sys/kvm/.
   


It cannot.  sVirt labels the disk image and other files qemu needs with 
the appropriate label, and everything else is off limits.  Even if you 
run the guest as root, it won't have access to other files.



Yeah that would be interesting information. But it is more related to
tracing than to pmu measurements.  The information which you
mentioned above are probably better captured by an extension of
trace-events to userspace.
   

It's all related.  You start with perf, see a problem with mmio, call up
a histogram of mmio or interrupts or whatever, then zoom in on the
misbehaving device.
 

Yes, but its different from the implementation point-of-view. For the
user it surely all plays together.
   


We need qemu to cooperate for mmio tracing, and we can cooperate with 
qemu for symbol resolution.  If it prevents adding another kernel API, 
that's a win from my POV.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 04:24 PM, Alexander Graf wrote:

Avi Kivity wrote:
   

On 03/24/2010 03:53 PM, Alexander Graf wrote:
 
   

Someone needs to know about the new guest to fetch its symbols.  Or do
you want that part in the kernel too?

 

How about we add a virtio "guest file system access" device? The guest
would then expose its own file system using that device.

On the host side this would simply be a -virtioguestfs
unix:/tmp/guest.fs and you'd get a unix socket that gives you full
access to the guest file system by using commands. I envision something
like:

   

The idea is to use a dedicated channel over virtio-serial.  If the
channel is present the file server can serve files over it.
 

The file server being a kernel module inside the guest? We want to be
able to serve things as early and hassle free as possible, so in this
case I agree with Ingo that a kernel module is superior.
   


No, just a daemon.  If it's important enough we can get distributions to 
package it by default, and then it will be hassle free.  If "early 
enough" is also so important, we can get it to start up on initrd.  If 
it's really critical, we can patch grub to serve the files as well.



SEND: GET /proc/version
RECV: Linux version 2.6.27.37-0.1-default (ge...@buildhost) (gcc version
4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15
14:56:58 +0200

Now all we need is integration in perf to enumerate virtual machines
based on libvirt. If you want to run qemu-kvm directly, just go with
--guestfs=/tmp/guest.fs and perf could fetch all required information
automatically.

This should solve all issues while staying 100% in user space, right?

   

Yeah, needs a fuse filesystem to populate the host namespace (kind of
sshfs over virtio-serial).
 

I don't see why we need a fuse filesystem. We can of course create one
later on. But for now all you need is a user connecting to that socket.
   


If the perf app knows the protocol, no problem.  But leave perf with 
pure filesystem access and hide the details in fuse.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 03:57:39PM +0200, Avi Kivity wrote:
> On 03/24/2010 03:46 PM, Joerg Roedel wrote:

>> Someone who uses libvirt and virt-manager by default is probably not
>> interested in this feature at the same level a kvm developer is. And
>> developers tend not to use libvirt for low-level kvm development.  A
>> number of developers have stated in this thread already that they would
>> appreciate a solution for guest enumeration that would not involve
>> libvirt.
>
> So would I.

Great.

> But when I weigh the benefit of truly transparent  system-wide perf
> integration for users who don't use libvirt but do use  perf, versus
> the cost of transforming kvm from a single-process API to a
> system-wide API with all the complications that I've listed, it comes
> out in favour of not adding the API.

Its not a transformation, its an extension. The current per-process
/dev/kvm stays mostly untouched. Its all about having something like
this:

$ cd /sys/kvm/guest0
$ ls -l
-r 1 root root 0 2009-08-17 12:05 name
dr-x-- 1 root root 0 2009-08-17 12:05 fs
$ cat name
guest0
$ # ...

The fs/ directory is used as the mount point for the guest root fs.

>> The samples will be tagged with the guest-name (and some additional
>> information perf needs). Perf userspace can access the symbols then
>> through /sys/kvm/guest0/fs/...
>
> I take that as a yes?  So we need a virtio-serial client in the kernel  
> (which might be exploitable by a malicious guest if buggy) and a  
> fs-over-virtio-serial client in the kernel (also exploitable).

What I meant was: perf-kernel puts the guest-name into every sample and
perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
symbols. I leave the question of how the guest-fs is exposed to the host
out of this discussion. We should discuss this seperatly.


>> An approach like: "The files are owned and only readable by the same
>> user that started the vm." might be a good start. So a user can measure
>> its own guests and root can measure all of them.
>
> That's not how sVirt works.  sVirt isolates a user's VMs from each  
> other, so if a guest breaks into qemu it can't break into other guests  
> owned by the same user.

If a vm breaks into qemu it can access the host file system which is the
bigger problem. In this case there is no isolation anymore. From that
context it can even kill other VMs of the same user independent of a
hypothetical /sys/kvm/.

>> Yeah that would be interesting information. But it is more related to
>> tracing than to pmu measurements.  The information which you
>> mentioned above are probably better captured by an extension of
>> trace-events to userspace.
>
> It's all related.  You start with perf, see a problem with mmio, call up  
> a histogram of mmio or interrupts or whatever, then zoom in on the  
> misbehaving device.

Yes, but its different from the implementation point-of-view. For the
user it surely all plays together.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Alexander Graf
Avi Kivity wrote:
> On 03/24/2010 03:53 PM, Alexander Graf wrote:
>>
>>> Someone needs to know about the new guest to fetch its symbols.  Or do
>>> you want that part in the kernel too?
>>>  
>>
>> How about we add a virtio "guest file system access" device? The guest
>> would then expose its own file system using that device.
>>
>> On the host side this would simply be a -virtioguestfs
>> unix:/tmp/guest.fs and you'd get a unix socket that gives you full
>> access to the guest file system by using commands. I envision something
>> like:
>>
>
> The idea is to use a dedicated channel over virtio-serial.  If the
> channel is present the file server can serve files over it.

The file server being a kernel module inside the guest? We want to be
able to serve things as early and hassle free as possible, so in this
case I agree with Ingo that a kernel module is superior.

>
>> SEND: GET /proc/version
>> RECV: Linux version 2.6.27.37-0.1-default (ge...@buildhost) (gcc version
>> 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15
>> 14:56:58 +0200
>>
>> Now all we need is integration in perf to enumerate virtual machines
>> based on libvirt. If you want to run qemu-kvm directly, just go with
>> --guestfs=/tmp/guest.fs and perf could fetch all required information
>> automatically.
>>
>> This should solve all issues while staying 100% in user space, right?
>>
>
> Yeah, needs a fuse filesystem to populate the host namespace (kind of
> sshfs over virtio-serial).

I don't see why we need a fuse filesystem. We can of course create one
later on. But for now all you need is a user connecting to that socket.


Alex


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 03:53 PM, Alexander Graf wrote:



Someone needs to know about the new guest to fetch its symbols.  Or do
you want that part in the kernel too?
 


How about we add a virtio "guest file system access" device? The guest
would then expose its own file system using that device.

On the host side this would simply be a -virtioguestfs
unix:/tmp/guest.fs and you'd get a unix socket that gives you full
access to the guest file system by using commands. I envision something
like:
   


The idea is to use a dedicated channel over virtio-serial.  If the 
channel is present the file server can serve files over it.



SEND: GET /proc/version
RECV: Linux version 2.6.27.37-0.1-default (ge...@buildhost) (gcc version
4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15
14:56:58 +0200

Now all we need is integration in perf to enumerate virtual machines
based on libvirt. If you want to run qemu-kvm directly, just go with
--guestfs=/tmp/guest.fs and perf could fetch all required information
automatically.

This should solve all issues while staying 100% in user space, right?
   


Yeah, needs a fuse filesystem to populate the host namespace (kind of 
sshfs over virtio-serial).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 03:46 PM, Joerg Roedel wrote:

On Wed, Mar 24, 2010 at 03:05:02PM +0200, Avi Kivity wrote:
   

On 03/24/2010 02:50 PM, Joerg Roedel wrote:
 
   

I don't want the tool for myself only. A typical perf user expects that
it works transparent.
   

A typical kvm user uses libvirt, so we can integrate it with that.
 

Someone who uses libvirt and virt-manager by default is probably not
interested in this feature at the same level a kvm developer is. And
developers tend not to use libvirt for low-level kvm development.  A
number of developers have stated in this thread already that they would
appreciate a solution for guest enumeration that would not involve
libvirt.
   


So would I.  But when I weigh the benefit of truly transparent 
system-wide perf integration for users who don't use libvirt but do use 
perf, versus the cost of transforming kvm from a single-process API to a 
system-wide API with all the complications that I've listed, it comes 
out in favour of not adding the API.


Those few users can probably script something to cover their needs.


Someone needs to know about the new guest to fetch its symbols.  Or do
you want that part in the kernel too?
 

The samples will be tagged with the guest-name (and some additional
information perf needs). Perf userspace can access the symbols then
through /sys/kvm/guest0/fs/...
   


I take that as a yes?  So we need a virtio-serial client in the kernel 
(which might be exploitable by a malicious guest if buggy) and a 
fs-over-virtio-serial client in the kernel (also exploitable).



Depends on how it is designed. A filesystem approach was already
mentioned. We could create /sys/kvm/ for example to expose information
about virtual machines to userspace. This would not require any new
security hooks.
   

Who would set the security context on those files?
 

An approach like: "The files are owned and only readable by the same
user that started the vm." might be a good start. So a user can measure
its own guests and root can measure all of them.
   


That's not how sVirt works.  sVirt isolates a user's VMs from each 
other, so if a guest breaks into qemu it can't break into other guests 
owned by the same user.


The users who need this API (!libvirt and perf) probably don't care 
about sVirt, but a new API must not break it.



Plus, we need cgroup  support so you can't see one container's guests
from an unrelated container.
 

cgroup support is an issue but we can solve that too. Its in general
still less complex than going through the whole libvirt-qemu-kvm stack.
   


It's a tradeoff.  IMO, going through qemu is the better way, and also 
provides more information.



Integration with qemu would allow perf to tell us that the guest is
hitting the interrupt status register of a virtio-blk device in pci
slot 5 (the information is already available through the kvm_mmio
trace event, but  only qemu can decode it).
 

Yeah that would be interesting information. But it is more related to
tracing than to pmu measurements.
The information which you mentioned above are probably better
captured by an extension of trace-events to userspace.
   


It's all related.  You start with perf, see a problem with mmio, call up 
a histogram of mmio or interrupts or whatever, then zoom in on the 
misbehaving device.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Alexander Graf
Avi Kivity wrote:
> On 03/24/2010 02:50 PM, Joerg Roedel wrote:
>>
>>> You can always provide the kernel and module paths as command line
>>> parameters.  It just won't be transparently usable, but if you're using
>>> qemu from the command line, presumably you can live with that.
>>>  
>> I don't want the tool for myself only. A typical perf user expects that
>> it works transparent.
>>
>
> A typical kvm user uses libvirt, so we can integrate it with that.
>
 Could be easily done using notifier chains already in the kernel.
 Probably implemented with much less than 100 lines of additional code.

>>> And a userspace interface for that.
>>>  
>> Not necessarily. The perf event is configured to measure systemwide kvm
>> by userspace. The kernel side of perf takes care that it stays
>> system-wide even with added vm instances. So in this case the consumer
>> for the notifier would be the perf kernel part. No userspace interface
>> required.
>>
>
> Someone needs to know about the new guest to fetch its symbols.  Or do
> you want that part in the kernel too?


How about we add a virtio "guest file system access" device? The guest
would then expose its own file system using that device.

On the host side this would simply be a -virtioguestfs
unix:/tmp/guest.fs and you'd get a unix socket that gives you full
access to the guest file system by using commands. I envision something
like:

SEND: GET /proc/version
RECV: Linux version 2.6.27.37-0.1-default (ge...@buildhost) (gcc version
4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15
14:56:58 +0200

Now all we need is integration in perf to enumerate virtual machines
based on libvirt. If you want to run qemu-kvm directly, just go with
--guestfs=/tmp/guest.fs and perf could fetch all required information
automatically.

This should solve all issues while staying 100% in user space, right?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 03:05:02PM +0200, Avi Kivity wrote:
> On 03/24/2010 02:50 PM, Joerg Roedel wrote:

>> I don't want the tool for myself only. A typical perf user expects that
>> it works transparent.
>
> A typical kvm user uses libvirt, so we can integrate it with that.

Someone who uses libvirt and virt-manager by default is probably not
interested in this feature at the same level a kvm developer is. And
developers tend not to use libvirt for low-level kvm development.  A
number of developers have stated in this thread already that they would
appreciate a solution for guest enumeration that would not involve
libvirt.

> Someone needs to know about the new guest to fetch its symbols.  Or do  
> you want that part in the kernel too?

The samples will be tagged with the guest-name (and some additional
information perf needs). Perf userspace can access the symbols then
through /sys/kvm/guest0/fs/...

>> Depends on how it is designed. A filesystem approach was already
>> mentioned. We could create /sys/kvm/ for example to expose information
>> about virtual machines to userspace. This would not require any new
>> security hooks.
>
> Who would set the security context on those files?

An approach like: "The files are owned and only readable by the same
user that started the vm." might be a good start. So a user can measure
its own guests and root can measure all of them.

> Plus, we need cgroup  support so you can't see one container's guests
> from an unrelated container.

cgroup support is an issue but we can solve that too. Its in general
still less complex than going through the whole libvirt-qemu-kvm stack.

> Integration with qemu would allow perf to tell us that the guest is
> hitting the interrupt status register of a virtio-blk device in pci
> slot 5 (the information is already available through the kvm_mmio
> trace event, but  only qemu can decode it).

Yeah that would be interesting information. But it is more related to
tracing than to pmu measurements.
The information which you mentioned above are probably better
captured by an extension of trace-events to userspace.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 02:50 PM, Joerg Roedel wrote:



You can always provide the kernel and module paths as command line
parameters.  It just won't be transparently usable, but if you're using
qemu from the command line, presumably you can live with that.
 

I don't want the tool for myself only. A typical perf user expects that
it works transparent.
   


A typical kvm user uses libvirt, so we can integrate it with that.


Could be easily done using notifier chains already in the kernel.
Probably implemented with much less than 100 lines of additional code.
   

And a userspace interface for that.
 

Not necessarily. The perf event is configured to measure systemwide kvm
by userspace. The kernel side of perf takes care that it stays
system-wide even with added vm instances. So in this case the consumer
for the notifier would be the perf kernel part. No userspace interface
required.
   


Someone needs to know about the new guest to fetch its symbols.  Or do 
you want that part in the kernel too?



If we make an API, I'd like it to be generally useful.
 

Thats hard to do at this point since we don't know what people will use
it for. We should keep it simple in the beginning and add new features
as they are requested and make sense in this context.
   


IMO this use case is to rare to warrant its own API, especially as there 
are alternatives.



It's a total headache.  For example, we'd need security module hooks to
determine access permissions.  So far we managed to avoid that since kvm
doesn't allow you to access any information beyond what you provided it
directly.
 

Depends on how it is designed. A filesystem approach was already
mentioned. We could create /sys/kvm/ for example to expose information
about virtual machines to userspace. This would not require any new
security hooks.
   


Who would set the security context on those files?  Plus, we need cgroup 
support so you can't see one container's guests from an unrelated container.



Copying the objects is a one time cost.  If you run perf for more than a
second or two, it would fetch and cache all of the data.  It's really
the same problem with non-guest profiling, only magnified a bit.
 

I don't think we can cache filesystem data of a running guest on the
host. It is too hard to keep such a cache coherent.
   


I don't see any choice.  The guest can change its symbols at any time 
(say by kexec), without any notification.



Other userspaces can also provide this functionality, like they have to
provide disk, network, and display emulation.  The kernel is not a huge
library.
 

If two userspaces run in parallel what is the single instance where perf
can get a list of guests from?
   


I don't know.  Surely that's solvable though.


kvm.ko has only a small subset of the information that is used to define
a guest.
 

The subset is not small. It contains all guest vcpus, the complete
interrupt routing hardware emulation and manages event the guests
memory.
   


It doesn't contain most of the mmio and pio address space.  Integration 
with qemu would allow perf to tell us that the guest is hitting the 
interrupt status register of a virtio-blk device in pci slot 5 (the 
information is already available through the kvm_mmio trace event, but 
only qemu can decode it).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 02:08:17PM +0200, Avi Kivity wrote:
> On 03/24/2010 01:59 PM, Joerg Roedel wrote:

> You can always provide the kernel and module paths as command line  
> parameters.  It just won't be transparently usable, but if you're using  
> qemu from the command line, presumably you can live with that.

I don't want the tool for myself only. A typical perf user expects that
it works transparent.

>> Could be easily done using notifier chains already in the kernel.
>> Probably implemented with much less than 100 lines of additional code.
>
> And a userspace interface for that.

Not necessarily. The perf event is configured to measure systemwide kvm
by userspace. The kernel side of perf takes care that it stays
system-wide even with added vm instances. So in this case the consumer
for the notifier would be the perf kernel part. No userspace interface
required.

> If we make an API, I'd like it to be generally useful.

Thats hard to do at this point since we don't know what people will use
it for. We should keep it simple in the beginning and add new features
as they are requested and make sense in this context.

> It's a total headache.  For example, we'd need security module hooks to  
> determine access permissions.  So far we managed to avoid that since kvm  
> doesn't allow you to access any information beyond what you provided it  
> directly.

Depends on how it is designed. A filesystem approach was already
mentioned. We could create /sys/kvm/ for example to expose information
about virtual machines to userspace. This would not require any new
security hooks.

> Copying the objects is a one time cost.  If you run perf for more than a  
> second or two, it would fetch and cache all of the data.  It's really  
> the same problem with non-guest profiling, only magnified a bit.

I don't think we can cache filesystem data of a running guest on the
host. It is too hard to keep such a cache coherent.

>>> Other userspaces can also provide this functionality, like they have to
>>> provide disk, network, and display emulation.  The kernel is not a huge
>>> library.

If two userspaces run in parallel what is the single instance where perf
can get a list of guests from?

> kvm.ko has only a small subset of the information that is used to define  
> a guest.

The subset is not small. It contains all guest vcpus, the complete
interrupt routing hardware emulation and manages event the guests
memory.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 01:59 PM, Joerg Roedel wrote:

On Wed, Mar 24, 2010 at 06:57:47AM +0200, Avi Kivity wrote:
   

On 03/23/2010 08:21 PM, Joerg Roedel wrote:
 

This enumeration is a very small and non-intrusive feature. Making it
aware of namespaces is easy too.

   

It's easier (and safer and all the other boring bits) not to do it at
all in the kernel.
 

For the KVM stack is doesn't matter where it is implemented. It is as
easy in qemu or libvirt as in the kernel. I also don't see big risks. On
the perf side and for its users it is a lot easier to have this in the
kernel.
I for example always use plain qemu when running kvm guests and never
used libvirt. The only central entity I have here is the kvm kernel
modules. I don't want to start using it only to be able to use perf kvm.
   


You can always provide the kernel and module paths as command line 
parameters.  It just won't be transparently usable, but if you're using 
qemu from the command line, presumably you can live with that.



Who would be the consumer of such notifications? A 'perf kvm list' can
live without I guess. If we need them later we can still add them.
   

System-wide monitoring needs to work equally well for guests started
before or after the monitor.
 

Could be easily done using notifier chains already in the kernel.
Probably implemented with much less than 100 lines of additional code.
   


And a userspace interface for that.


Even disregarding that, if you introduce  an API, people will start
using it and complaining if it's incomplete.
 

There is nothing wrong with that. We only need to define what this API
should be used for to prevent rank growth. It could be an
instrumentation-only API for example.
   


If we make an API, I'd like it to be generally useful.

It's a total headache.  For example, we'd need security module hooks to 
determine access permissions.  So far we managed to avoid that since kvm 
doesn't allow you to access any information beyond what you provided it 
directly.




My statement was not limited to enumeration, I should have been more
clear about that. The guest filesystem access-channel is another
affected part. The 'perf kvm top' command will access the guest
filesystem regularly and going over qemu would be more overhead here.

   

Why?  Also, the real cost would be accessing the filesystem, not copying
data over qemu.
 

When measuring cache-misses any additional (and in this case
unnecessary) copy-overhead result in less appropriate results.
   


Copying the objects is a one time cost.  If you run perf for more than a 
second or two, it would fetch and cache all of the data.  It's really 
the same problem with non-guest profiling, only magnified a bit.



Providing this in the KVM module directly also has the benefit that it
would work out-of-the-box with different userspaces too.  Or do we want
to limit 'perf kvm' to the libvirt-qemu-kvm software stack?
   

Other userspaces can also provide this functionality, like they have to
provide disk, network, and display emulation.  The kernel is not a huge
library.
 

This has nothing to do with a library. It is about entity and resource
management which is what os kernels are about. The virtual machine is
the entity (similar to a process) and we want to add additional access
channels and names to it.
   


kvm.ko has only a small subset of the information that is used to define 
a guest.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Paolo Bonzini

On 03/22/2010 08:13 AM, Avi Kivity wrote:


(btw, why are you interested in desktop-on-desktop?  one use case is
developers, which don't really need fancy GUIs; a second is people who
test out distributions, but that doesn't seem to be a huge population;
and a third is people running Windows for some application that doesn't
run on Linux - hopefully a small catergory as well.


This third category is pretty well served by virt-manager.  It has its 
quirks and shortcomings, but at least it exists.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Joerg Roedel
On Wed, Mar 24, 2010 at 06:57:47AM +0200, Avi Kivity wrote:
> On 03/23/2010 08:21 PM, Joerg Roedel wrote:
>> This enumeration is a very small and non-intrusive feature. Making it
>> aware of namespaces is easy too.
>>
>
> It's easier (and safer and all the other boring bits) not to do it at  
> all in the kernel.

For the KVM stack is doesn't matter where it is implemented. It is as
easy in qemu or libvirt as in the kernel. I also don't see big risks. On
the perf side and for its users it is a lot easier to have this in the
kernel.
I for example always use plain qemu when running kvm guests and never
used libvirt. The only central entity I have here is the kvm kernel
modules. I don't want to start using it only to be able to use perf kvm.

>> Who would be the consumer of such notifications? A 'perf kvm list' can
>> live without I guess. If we need them later we can still add them.
>
> System-wide monitoring needs to work equally well for guests started  
> before or after the monitor.

Could be easily done using notifier chains already in the kernel.
Probably implemented with much less than 100 lines of additional code.

> Even disregarding that, if you introduce  an API, people will start
> using it and complaining if it's incomplete.

There is nothing wrong with that. We only need to define what this API
should be used for to prevent rank growth. It could be an
instrumentation-only API for example.

>> My statement was not limited to enumeration, I should have been more
>> clear about that. The guest filesystem access-channel is another
>> affected part. The 'perf kvm top' command will access the guest
>> filesystem regularly and going over qemu would be more overhead here.
>>
>
> Why?  Also, the real cost would be accessing the filesystem, not copying  
> data over qemu.

When measuring cache-misses any additional (and in this case
unnecessary) copy-overhead result in less appropriate results.

>> Providing this in the KVM module directly also has the benefit that it
>> would work out-of-the-box with different userspaces too.  Or do we want
>> to limit 'perf kvm' to the libvirt-qemu-kvm software stack?
>
> Other userspaces can also provide this functionality, like they have to  
> provide disk, network, and display emulation.  The kernel is not a huge  
> library.

This has nothing to do with a library. It is about entity and resource
management which is what os kernels are about. The virtual machine is
the entity (similar to a process) and we want to add additional access
channels and names to it.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Andi Kleen
Avi Kivity  writes:

> On 03/24/2010 09:38 AM, Andi Kleen wrote:
>>> If you're profiling a single guest it makes more sense to do this from
>>> inside the guest - you can profile userspace as well as the kernel.
>>>  
>> I'm interested in debugging the guest without guest cooperation.
>>
>> In many cases qemu's new gdb stub works for that, but in some cases
>> I would prefer instruction/branch traces over standard gdb style
>> debugging.
>>
>
> Isn't gdb supposed to be able to use branch traces? 

AFAIK not. The ptrace interface is only used by idb I believe.
I might be wrong on that.

Not sure if there is even a remote protocol command for 
branch traces either.

There's a concept of "tracepoints" in the protocol, but it 
doesn't quite match at.

> It makes sense to
> expose them via the gdb stub then.  Not to say an external tool
> doesn't make sense.

Ok that would work for me too. As long as I can set start/stop
triggers and pipe the log somewhere it's fine for me.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Avi Kivity

On 03/24/2010 09:38 AM, Andi Kleen wrote:

If you're profiling a single guest it makes more sense to do this from
inside the guest - you can profile userspace as well as the kernel.
 

I'm interested in debugging the guest without guest cooperation.

In many cases qemu's new gdb stub works for that, but in some cases
I would prefer instruction/branch traces over standard gdb style
debugging.
   


Isn't gdb supposed to be able to use branch traces?  It makes sense to 
expose them via the gdb stub then.  Not to say an external tool doesn't 
make sense.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-24 Thread Andi Kleen
> If you're profiling a single guest it makes more sense to do this from 
> inside the guest - you can profile userspace as well as the kernel.

I'm interested in debugging the guest without guest cooperation.

In many cases qemu's new gdb stub works for that, but in some cases
I would prefer instruction/branch traces over standard gdb style
debugging.

I used to use that very successfully with simulators in the past
for some hard bugs.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-23 Thread Avi Kivity

On 03/24/2010 07:09 AM, Andi Kleen wrote:

Joerg Roedel  writes:
   

Sidenote: I really think we should come to a conclusion about the
   concept. KVM integration into perf is very useful feature to
  analyze virtualization workloads.
 

Agreed. I especially would like to see instruction/branch tracing
working this way.  This would a lot of the benefits of a simulator on
a real CPU.
   


If you're profiling a single guest it makes more sense to do this from 
inside the guest - you can profile userspace as well as the kernel.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-23 Thread Andi Kleen
Joerg Roedel  writes:
>
> Sidenote: I really think we should come to a conclusion about the
>   concept. KVM integration into perf is very useful feature to
> analyze virtualization workloads.

Agreed. I especially would like to see instruction/branch tracing
working this way.  This would a lot of the benefits of a simulator on
a real CPU.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-23 Thread Avi Kivity

On 03/23/2010 08:21 PM, Joerg Roedel wrote:

On Tue, Mar 23, 2010 at 06:39:58PM +0200, Avi Kivity wrote:
   

On 03/23/2010 04:06 PM, Joerg Roedel wrote:
 
   

And this system wide entity is the kvm module. It creates instances of
'struct kvm' and destroys them. I see no problem if we just attach a
name to every instance with a good default value like kvm0, kvm1 ... or
guest0, guest1 ... User-space can override the name if it wants. The kvm
module takes care about the names being unique.

   

So, two users can't have a guest named MyGuest each?  What about
namespace support?  There's a lot of work in virtualizing all kernel
namespaces, you're adding to that.
 

This enumeration is a very small and non-intrusive feature. Making it
aware of namespaces is easy too.
   


It's easier (and safer and all the other boring bits) not to do it at 
all in the kernel.



What about notifications when guests  are added or removed?
 

Who would be the consumer of such notifications? A 'perf kvm list' can
live without I guess. If we need them later we can still add them.
   


System-wide monitoring needs to work equally well for guests started 
before or after the monitor.  Even disregarding that, if you introduce 
an API, people will start using it and complaining if it's incomplete.


The equivalent functionality for network interfaces is in netlink.


This is very much the same as network card numbering is implemented in
the kernel.
Forcing perf to talk to qemu or even libvirt produces to much overhead
imho. Instrumentation only produces useful results with low overhead.

   

It's a setup cost only.
 

My statement was not limited to enumeration, I should have been more
clear about that. The guest filesystem access-channel is another
affected part. The 'perf kvm top' command will access the guest
filesystem regularly and going over qemu would be more overhead here.
   


Why?  Also, the real cost would be accessing the filesystem, not copying 
data over qemu.



Providing this in the KVM module directly also has the benefit that it
would work out-of-the-box with different userspaces too.  Or do we want
to limit 'perf kvm' to the libvirt-qemu-kvm software stack?
   


Other userspaces can also provide this functionality, like they have to 
provide disk, network, and display emulation.  The kernel is not a huge 
library.



Sidenote: I really think we should come to a conclusion about the
   concept. KVM integration into perf is very useful feature to
  analyze virtualization workloads.

   


Agreed.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-23 Thread Javier Guerra Giraldez
On Tue, Mar 23, 2010 at 2:21 PM, Joerg Roedel  wrote:
> On Tue, Mar 23, 2010 at 06:39:58PM +0200, Avi Kivity wrote:
>> So, two users can't have a guest named MyGuest each?  What about
>> namespace support?  There's a lot of work in virtualizing all kernel
>> namespaces, you're adding to that.
>
> This enumeration is a very small and non-intrusive feature. Making it
> aware of namespaces is easy too.

an outsider's comment: this path leads to a filesystem... which could
be a very nice idea.  it could have a directory for each VM, with
pseudo-files with all the guest's status, and even the memory it's
using.  perf could simply watch those files.   in fact, such a
filesystem could be the main userleve/kernel interface.

but i'm sure such a layour was considered (and rejected) very early in
the KVM design.  i don't think there's anything new to make it more
desirable than it was back then.


-- 
Javier
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-23 Thread Peter Zijlstra
On Tue, 2010-03-23 at 19:21 +0100, Joerg Roedel wrote:

> Sidenote: I really think we should come to a conclusion about the
>   concept. KVM integration into perf is very useful feature to
> analyze virtualization workloads.

I always start my things with bare kvm, It would be very unwelcome to
mandate libvirt, or for that matter running a particular userspace in
the guest.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-23 Thread Joerg Roedel
On Tue, Mar 23, 2010 at 06:39:58PM +0200, Avi Kivity wrote:
> On 03/23/2010 04:06 PM, Joerg Roedel wrote:

>> And this system wide entity is the kvm module. It creates instances of
>> 'struct kvm' and destroys them. I see no problem if we just attach a
>> name to every instance with a good default value like kvm0, kvm1 ... or
>> guest0, guest1 ... User-space can override the name if it wants. The kvm
>> module takes care about the names being unique.
>>
>
> So, two users can't have a guest named MyGuest each?  What about  
> namespace support?  There's a lot of work in virtualizing all kernel  
> namespaces, you're adding to that.

This enumeration is a very small and non-intrusive feature. Making it
aware of namespaces is easy too.

> What about notifications when guests  are added or removed?

Who would be the consumer of such notifications? A 'perf kvm list' can
live without I guess. If we need them later we can still add them.

>> This is very much the same as network card numbering is implemented in
>> the kernel.
>> Forcing perf to talk to qemu or even libvirt produces to much overhead
>> imho. Instrumentation only produces useful results with low overhead.
>>
>
> It's a setup cost only.

My statement was not limited to enumeration, I should have been more
clear about that. The guest filesystem access-channel is another
affected part. The 'perf kvm top' command will access the guest
filesystem regularly and going over qemu would be more overhead here.
Providing this in the KVM module directly also has the benefit that it
would work out-of-the-box with different userspaces too.  Or do we want
to limit 'perf kvm' to the libvirt-qemu-kvm software stack?

Sidenote: I really think we should come to a conclusion about the
  concept. KVM integration into perf is very useful feature to
  analyze virtualization workloads.

Thanks,

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-23 Thread Avi Kivity

On 03/23/2010 04:06 PM, Joerg Roedel wrote:

On Mon, Mar 22, 2010 at 05:06:17PM -0500, Anthony Liguori wrote:
   

There always needs to be a system wide entity.  There are two ways to
enumerate instances from that system wide entity.  You can centralize
the creation of instances and there by maintain an list of current
instances.  You can also allow instances to be created in a
decentralized manner and provide a standard mechanism for instances to
register themselves with the system wide entity.
 

And this system wide entity is the kvm module. It creates instances of
'struct kvm' and destroys them. I see no problem if we just attach a
name to every instance with a good default value like kvm0, kvm1 ... or
guest0, guest1 ... User-space can override the name if it wants. The kvm
module takes care about the names being unique.
   


So, two users can't have a guest named MyGuest each?  What about 
namespace support?  There's a lot of work in virtualizing all kernel 
namespaces, you're adding to that.  What about notifications when guests 
are added or removed?



This is very much the same as network card numbering is implemented in
the kernel.
Forcing perf to talk to qemu or even libvirt produces to much overhead
imho. Instrumentation only produces useful results with low overhead.

   


It's a setup cost only.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-23 Thread Anthony Liguori

On 03/23/2010 04:07 AM, Avi Kivity wrote:

On 03/23/2010 12:06 AM, Anthony Liguori wrote:
Having qemu enumerate guests one way or another is not a good idea 
IMO since it is focused on one guest and doesn't have a system-wide 
entity.



There always needs to be a system wide entity.  There are two ways to 
enumerate instances from that system wide entity.  You can centralize 
the creation of instances and there by maintain an list of current 
instances.  You can also allow instances to be created in a 
decentralized manner and provide a standard mechanism for instances 
to register themselves with the system wide entity.


IOW, it's the difference between asking libvirtd to exec(qemu) vs 
allowing a user to exec(qemu) and having qemu connect to a well known 
unix domain socket for libvirt to tell libvirtd that it exists.


The later approach has a number of advantages.  libvirt already 
supports both models.  The former is the '/system' uri and the later 
is the '/session' uri.


What I'm proposing, is to use the host file system as the system wide 
entity instead of libvirtd.  libvirtd can monitor the host file 
system to participate in these activities but ultimately, moving this 
functionality out of libvirtd means that it becomes the standard 
mechanism for all qemu instances regardless of how they're launched.


I don't like dropping sockets into the host filesystem, especially as 
they won't be cleaned up on abnormal exit.  I also think this breaks 
our 'mechanism, not policy' policy.  Someone may want to do something 
weird with qemu that doesn't work well with this.


The approach I've taken (which I accidentally committed and reverted) 
was to set this up as the default qmp device much like we have a default 
monitor device.  A user is capable of overriding this by manually 
specifying a qmp device or by disabling defaults.


We could allow starting monitors from the global configuration file, 
so a distribution can do this if it wants, but I don't think we should 
do this ourselves by default.


I've looked at making default devices globally configurable.  We'll get 
there but I think that's orthogonal to setting up a useful default qmp 
device.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-23 Thread Joerg Roedel
On Mon, Mar 22, 2010 at 05:06:17PM -0500, Anthony Liguori wrote:
> There always needs to be a system wide entity.  There are two ways to  
> enumerate instances from that system wide entity.  You can centralize  
> the creation of instances and there by maintain an list of current  
> instances.  You can also allow instances to be created in a  
> decentralized manner and provide a standard mechanism for instances to  
> register themselves with the system wide entity.

And this system wide entity is the kvm module. It creates instances of
'struct kvm' and destroys them. I see no problem if we just attach a
name to every instance with a good default value like kvm0, kvm1 ... or
guest0, guest1 ... User-space can override the name if it wants. The kvm
module takes care about the names being unique.
This is very much the same as network card numbering is implemented in
the kernel.
Forcing perf to talk to qemu or even libvirt produces to much overhead
imho. Instrumentation only produces useful results with low overhead.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-23 Thread Bernd Petrovitsch
On Mon, 2010-03-22 at 21:21 +0100, Ingo Molnar wrote:
[...]
> Have you made thoughts about why that might be so?

Yes.

Forword: I assume with "GUI" you mean "a user interface for the
classical desktop user with next to no interest in learning details or
basics".
That doesn't mean the classical desktop user is silly, stupid or
otherwise handicapped - it's just the lack of interest and/or time.

> I think it's because of what i outlined above - that you are trying to apply 
> the "UNIX secret" to GUIs - and that is a mistake.

No, it's the very same mechanism. But you just have to start at the
correct point. In the kernel/device driver world, you start at the
device.
And in the GUI world, you better start at the GUI (and not some kernel
API, library API, GUI tool or toolchains or anywhere else).

> A good GUI is almost at the _exact opposite spectrum_ of good command-line 
> tool: tightly integrated, with 'layering violations' designed into it all 
> over 
> the place:
>
>   look i can paste the text from an editor straight into a firefox form. I
>   didnt go through any hiearchy of layers, i just took the shortest path 
>   between the apps!
> 
> In other words: in a GUI the output controls the design, for command-line 
ACK, because you to make the GUI understandable to the intended users.
If that means "hiding 90% of all possibilities and features", you just
hide them.
Of course, the user of such an UI is quite limited doesn't use much of
the functionality - because s/he can't access it through the GUI - (but
presenting 100% - or even 40% - doesn't help either as s/he won't
understand it anyways).

> tools the design controls the output.
ACK, because the user in this case (which is most of the time a
developer, sys-admin, or similar techie) *wants* an 1:1 picture of the
underlying model because s/he already *knows* the underlying model (and
is willing and able to adapt the own workflow to the underlying models).

> It is no wonder Unix always had its problems with creating good GUIs that are 

ACK. The clichee-Unix-person doesn't come from the "GUI world". So most
of them are "trained" and used to look what's there and improve on it.

> efficient to humans. A good GUI works like the human brain, and the human 
> brain does not mind 'layering violations' when that gets it a more efficient 
> result.

If this is the case, the layering/structure/design of the GUI is (very)
badly defined/chosen (for whatever reason).

[ Most probably because some seasoned software developer designed the
GUI-app *without* designing (and testing!) the GUI (or more to the
point: the look - how does it look like - and feel - how does it behave,
what are the possible workflows, ... - of it) first. ]

Bernd
-- 
Bernd Petrovitsch  Email : be...@petrovitsch.priv.at
 LUGA : http://www.luga.at

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-23 Thread Antoine Martin

On 03/23/2010 05:13 PM, Kevin Wolf wrote:

Am 22.03.2010 23:06, schrieb Anthony Liguori:
   

On 03/22/2010 02:47 PM, Avi Kivity wrote:
 

Having qemu enumerate guests one way or another is not a good idea IMO
since it is focused on one guest and doesn't have a system-wide entity.
   

There always needs to be a system wide entity.  There are two ways to
enumerate instances from that system wide entity.  You can centralize
the creation of instances and there by maintain an list of current
instances.  You can also allow instances to be created in a
decentralized manner and provide a standard mechanism for instances to
register themselves with the system wide entity.

IOW, it's the difference between asking libvirtd to exec(qemu) vs
allowing a user to exec(qemu) and having qemu connect to a well known
unix domain socket for libvirt to tell libvirtd that it exists.
 

I think the latter is exactly what I would want for myself. I do see the
advantages of having a central instance, but I really don't want to
bother with libvirt configuration files or even GUIs just to get an
ad-hoc VM up when I can simply run "qemu -hda hd.img -m 1024". Let alone
that I usually want to have full control over qemu, including monitor
access and small details available as command line options.

I know that I'm not the average user with these requirements, but still
I am one user and do have these requirements. If I could just install
libvirt, continue using qemu as I always did and libvirt picked my VMs
up for things like global enumeration, that would be more or less the
optimal thing for me.
   

+1
And it would also make it more likely that users like us would convert 
to libvirt in the long run, by providing an easy and integrated 
transition path.
I've had another look at libvirt, and one of the things that is holding 
me back is the cost of moving existing scripts to libvirt. If it could 
just pick up what I have (at least in part), then I don't have to.


Antoine


Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-23 Thread Kevin Wolf
Am 22.03.2010 23:06, schrieb Anthony Liguori:
> On 03/22/2010 02:47 PM, Avi Kivity wrote:
>> Having qemu enumerate guests one way or another is not a good idea IMO 
>> since it is focused on one guest and doesn't have a system-wide entity.
> 
> There always needs to be a system wide entity.  There are two ways to 
> enumerate instances from that system wide entity.  You can centralize 
> the creation of instances and there by maintain an list of current 
> instances.  You can also allow instances to be created in a 
> decentralized manner and provide a standard mechanism for instances to 
> register themselves with the system wide entity.
> 
> IOW, it's the difference between asking libvirtd to exec(qemu) vs 
> allowing a user to exec(qemu) and having qemu connect to a well known 
> unix domain socket for libvirt to tell libvirtd that it exists.

I think the latter is exactly what I would want for myself. I do see the
advantages of having a central instance, but I really don't want to
bother with libvirt configuration files or even GUIs just to get an
ad-hoc VM up when I can simply run "qemu -hda hd.img -m 1024". Let alone
that I usually want to have full control over qemu, including monitor
access and small details available as command line options.

I know that I'm not the average user with these requirements, but still
I am one user and do have these requirements. If I could just install
libvirt, continue using qemu as I always did and libvirt picked my VMs
up for things like global enumeration, that would be more or less the
optimal thing for me.

Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-23 Thread Olivier Galibert
On Mon, Mar 22, 2010 at 03:54:37PM +0100, Ingo Molnar wrote:
> Yes, i thought Qemu would be a prime candidate to be the baseline for 
> tools/kvm/, but i guess that has become socially impossible now after this 
> flamewar. It's not a big problem in the big scheme of things: tools/kvm/ is 
> best grown up from a small towards larger size anyway ...

I'm curious, where would you put the limit?  Let's imagine a tools/kvm
appears, be it qemu or not, that's outside the scope of my question.
Would you put the legacy PC bios in there (seabios I guess)?  The EFI
bios? The windows-compiled paravirtual drivers? The Xorg paravirtual
DDX ?  Mesa (which includes the pv gallium drivers)? The
libvirt-equivalent? The GUI?

That's not a rhetorical question btw, I really wonder where the limit
should be.

  OG.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-23 Thread Avi Kivity

On 03/23/2010 12:06 AM, Anthony Liguori wrote:
Having qemu enumerate guests one way or another is not a good idea 
IMO since it is focused on one guest and doesn't have a system-wide 
entity.



There always needs to be a system wide entity.  There are two ways to 
enumerate instances from that system wide entity.  You can centralize 
the creation of instances and there by maintain an list of current 
instances.  You can also allow instances to be created in a 
decentralized manner and provide a standard mechanism for instances to 
register themselves with the system wide entity.


IOW, it's the difference between asking libvirtd to exec(qemu) vs 
allowing a user to exec(qemu) and having qemu connect to a well known 
unix domain socket for libvirt to tell libvirtd that it exists.


The later approach has a number of advantages.  libvirt already 
supports both models.  The former is the '/system' uri and the later 
is the '/session' uri.


What I'm proposing, is to use the host file system as the system wide 
entity instead of libvirtd.  libvirtd can monitor the host file system 
to participate in these activities but ultimately, moving this 
functionality out of libvirtd means that it becomes the standard 
mechanism for all qemu instances regardless of how they're launched.


I don't like dropping sockets into the host filesystem, especially as 
they won't be cleaned up on abnormal exit.  I also think this breaks our 
'mechanism, not policy' policy.  Someone may want to do something weird 
with qemu that doesn't work well with this.


We could allow starting monitors from the global configuration file, so 
a distribution can do this if it wants, but I don't think we should do 
this ourselves by default.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Anthony Liguori

On 03/22/2010 02:47 PM, Avi Kivity wrote:

On 03/22/2010 09:27 PM, Ingo Molnar wrote:



If your position basically boils down to, we can't trust userspace
and we can always trust the kernel, I want to eliminate any
userspace path, then I can't really help you out.
Why would you want to 'help me out'? I can tell a good solution from 
a bad one

just fine.


You are basically making a kernel implementation a requirement, 
instead of something that follows from the requirement.


You should instead read the long list of disadvantages above, invert 
them and
list then as advantages for the kernel-based vcpu enumeration 
solution, apply
common sense and go admit to yourself that indeed in this situation a 
kernel

provided enumeration of vcpu contexts is the most robust solution.


Having qemu enumerate guests one way or another is not a good idea IMO 
since it is focused on one guest and doesn't have a system-wide entity.


There always needs to be a system wide entity.  There are two ways to 
enumerate instances from that system wide entity.  You can centralize 
the creation of instances and there by maintain an list of current 
instances.  You can also allow instances to be created in a 
decentralized manner and provide a standard mechanism for instances to 
register themselves with the system wide entity.


IOW, it's the difference between asking libvirtd to exec(qemu) vs 
allowing a user to exec(qemu) and having qemu connect to a well known 
unix domain socket for libvirt to tell libvirtd that it exists.


The later approach has a number of advantages.  libvirt already supports 
both models.  The former is the '/system' uri and the later is the 
'/session' uri.


What I'm proposing, is to use the host file system as the system wide 
entity instead of libvirtd.  libvirtd can monitor the host file system 
to participate in these activities but ultimately, moving this 
functionality out of libvirtd means that it becomes the standard 
mechanism for all qemu instances regardless of how they're launched.


Regards,

Anthony Liguori

  A userspace system-wide entity will work just as well as kernel 
implementation, without its disadvantages.




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Daniel P. Berrange
On Tue, Mar 23, 2010 at 03:00:28AM +0700, Antoine Martin wrote:
> On 03/23/2010 02:15 AM, Anthony Liguori wrote:
> >On 03/22/2010 12:55 PM, Avi Kivity wrote:
> >>>Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by 
> >>>Anthony.
> >>>There's numerous ways that this can break:
> >>
> >>I don't like it either.  We have libvirt for enumerating guests.
> >
> >We're stuck in a rut with libvirt and I think a lot of the 
> >dissatisfaction with qemu is rooted in that.  It's not libvirt that's 
> >the probably, but the relationship between qemu and libvirt.
> +1
> The obvious reason why so many people still use shell scripts rather 
> than libvirt is because if it just doesn't provide what they need.
> Every time I've looked at it (and I've been looking for a better 
> solution for many years), it seems that it would have provided most of 
> the things I needed, but the remaining bits were unsolvable.

If you happen to remember what missing features prevented you choosing
libvirt, that would be invaluable information for us, to see if there
are quick wins that will help out. We got very useful feedback when
recently asking people this same question

http://rwmj.wordpress.com/2010/01/07/quick-quiz-what-stops-you-from-using-libvirt/

Allowing arbitrary passthrough of QEMU commands/args will solve some of
these issues, but certainly far from solving all of them. eg guest cut+
paste, host side control of guest screen resolution, easier x509/TLS 
configuration for remote management, soft reboot, Windows desktop support
for virt-manager, host network interface management/setup, etc

Regards,
Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Avi Kivity

On 03/22/2010 10:46 PM, Ingo Molnar wrote:



You should instead read the long list of disadvantages above, invert them
and list then as advantages for the kernel-based vcpu enumeration
solution, apply common sense and go admit to yourself that indeed in this
situation a kernel provided enumeration of vcpu contexts is the most
robust solution.
   

Having qemu enumerate guests one way or another is not a good idea IMO since
it is focused on one guest and doesn't have a system-wide entity.  A
userspace system-wide entity will work just as well as kernel
implementation, without its disadvantages.
 

A system-wide user-space entity only solves one problem out of the 4 i listed,
still leaving the other 3:

  - Those special files can get corrupted, mis-setup, get out of sync, or can
be hard to discover.
   


That's a hard requirement anyway.  If it happens, we get massive data 
loss.  Way more troubling than 'perf kvm top' doesn't work.  So consider 
it fulfilled.



  - Apps might start KVM vcpu instances without adhering to the
system-wide access method.
   


Then you don't get their symbol tables.  That happens anyway if the 
symbol server is not installed, not running, handing out fake data.  So 
we have to deal with that anyway.



  - There is no guarantee for the system-wide process to reply to a request -
while the kernel can always guarantee an enumeration result. I dont want
'perf kvm' to hang or misbehave just because the system-wide entity has
hung.
   


When you press a key there is no guarantee no component along the way 
will time out.



Really, i think i have to give up and not try to convince you guys about this
anymore - i dont think you are arguing constructively anymore and i dont want
yet another pointless flamewar about this.

Please consider 'perf kvm' scrapped indefinitely, due to lack of robust KVM
instrumentation features: due to lack of robust+universal vcpu/guest
enumeration and due to lack of robust+universal symbol access on the KVM side.
It was a really promising feature IMO and i invested two days of arguments
into it trying to find a workable solution, but it was not to be.
   


I am not going to push libvirt or a subset thereof into the kernel for 
'perf kvm'.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Ingo Molnar

* Avi Kivity  wrote:

> On 03/22/2010 09:27 PM, Ingo Molnar wrote:
> >
> >> If your position basically boils down to, we can't trust userspace
> >> and we can always trust the kernel, I want to eliminate any
> >> userspace path, then I can't really help you out.
> >
> > Why would you want to 'help me out'? I can tell a good solution from a bad 
> > one just fine.
> 
> You are basically making a kernel implementation a requirement, instead of 
> something that follows from the requirement.

No, i'm not.

> > You should instead read the long list of disadvantages above, invert them 
> > and list then as advantages for the kernel-based vcpu enumeration 
> > solution, apply common sense and go admit to yourself that indeed in this 
> > situation a kernel provided enumeration of vcpu contexts is the most 
> > robust solution.
> 
> Having qemu enumerate guests one way or another is not a good idea IMO since 
> it is focused on one guest and doesn't have a system-wide entity.  A 
> userspace system-wide entity will work just as well as kernel 
> implementation, without its disadvantages.

A system-wide user-space entity only solves one problem out of the 4 i listed, 
still leaving the other 3:

 - Those special files can get corrupted, mis-setup, get out of sync, or can
   be hard to discover.

 - Apps might start KVM vcpu instances without adhering to the
   system-wide access method.

 - There is no guarantee for the system-wide process to reply to a request -
   while the kernel can always guarantee an enumeration result. I dont want
   'perf kvm' to hang or misbehave just because the system-wide entity has 
   hung.

Really, i think i have to give up and not try to convince you guys about this 
anymore - i dont think you are arguing constructively anymore and i dont want 
yet another pointless flamewar about this.

Please consider 'perf kvm' scrapped indefinitely, due to lack of robust KVM 
instrumentation features: due to lack of robust+universal vcpu/guest 
enumeration and due to lack of robust+universal symbol access on the KVM side. 
It was a really promising feature IMO and i invested two days of arguments 
into it trying to find a workable solution, but it was not to be.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Avi Kivity

On 03/22/2010 10:35 PM, Ingo Molnar wrote:



And your point is that such vcpus should be excluded from profiling just
because they fall outside the Qemu/libvirt umbrella?

That is a ridiculous position.

   

Non-guest vcpus will not be able to provide Linux-style symbols.
 

And why do you say that it makes no sense to profile them?
   


It makes sense to profile them, but you don't need to contact their 
userspace tool for that.



Also, why do you define 'guest vcpus' to be 'Qemu started guest vcpus'? If
some other KVM using project (which you encouraged just a few mails ago)
starts a vcpu we still want to be able to profile them.

   


Maybe it should provide a mechanism for libvirt to list it.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Avi Kivity

On 03/22/2010 10:32 PM, Ingo Molnar wrote:

* Anthony Liguori  wrote:

   

On 03/22/2010 02:22 PM, Ingo Molnar wrote:
 

Transitive had a product that was using a KVM context to run their
binary translator which allowed them full access to the host
processes virtual address space range.  In this case, there is no
kernel and there are no devices.
 

And your point is that such vcpus should be excluded from profiling just
because they fall outside the Qemu/libvirt umbrella?
   

You don't instrument it the way you'd instrument an operating system so no,
you don't want it to show up in perf kvm top.
 

Erm, why not? It's executing a virtualized CPU, so sure it makes sense to
allow the profiling of it!
   


It may not make sense to have symbol tables for it, for example it isn't 
generated from source code but from binary code for another architecture.


Of course, just showing addresses is fine, but you don't need qemu for that.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Avi Kivity

On 03/22/2010 10:29 PM, Ingo Molnar wrote:

* Avi Kivity  wrote:

   

I think you didnt understand my point. I am talking about 'perf kvm top'
hanging if Qemu hangs.
   

Use non-blocking I/O, report that guest as dead.  No point in profiling it,
it isn't making any progress.
 

Erm, at what point do i decide that a guest is 'dead' versus 'just lagged due
to lots of IO' ?
   


qemu shouldn't block due to I/O (it does now, but there is work to fix 
it).  Of course it could be swapping or other things.


Pick a timeout, everything we do has timeouts these days.  It's the 
price we pay for protection: if you put something where a failure can't 
hurt you, you have to be prepared for failure, and you might have false 
alarms.


Is it so horrible for 'perf kvm top'?  No user data loss will happen, 
surely?


On the other hand, if it's in the kernel and it fails, you will lose 
service or perhaps data.



Also, do you realize that you increase complexity (the use of non-blocking
IO), just to protect against something that wouldnt happen if the right
solution was used in the first place?
   


It's a tradeoff.  Increasing the kernel code size vs. increasing 
userspace size.



With a proper in-kernel enumeration the kernel would always guarantee the
functionality, even if the vcpu does not make progress (i.e. it's "hung").

With this implemented in Qemu we lose that kind of robustness guarantee.
   

If qemu has a bug in the resource enumeration code, you can't profile one
guest.  If the kernel has a bug in the resource enumeration code, the system
either panics or needs to be rebooted later.
 

This is really simple code, not rocket science. If there's a bug in it we'll
fix it. On the other hand a 500KLOC+ piece of Qemu code has lots of places to
hang, so that is a large cross section.

   


The kernel has tons of very simple code (and some very complex code as 
well), and tons of -stable updates as well.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Avi Kivity

On 03/22/2010 10:21 PM, Ingo Molnar wrote:

* Alexander Graf  wrote:

   

Furthermore, another negative effect is that many times features are
implemented not in their technically best way, but in a way to keep them
local to the project that originates them. This is done to keep deployment
latencies and general contribution overhead down to a minimum. The moment
you have to work with yet another project, the overhead adds up.
   

I disagree there. Keeping things local and self-contained has been the UNIX
secret. It works really well as long as the boundaries are well defined.
 

The 'UNIX secret' works for text driven pipelined commands where we are
essentially programming via narrow ASCII input of mathematical logic.

It doesnt work for a GUI that is a 2D/3D environment of millions of pixels,
shaped by human visual perception of prettiness, familiarity and efficiency.
   


Modularization is needed when a project exceeds the average developer's 
capacity.  For kvm,  it is logical to separate privileged cpu 
virtualization, from guest virtualization, from host management, from 
cluster management.



The problem we're facing is that we're simply lacking an active GUI /
desktop user development community. We have desktop users, but nobody feels
like tackling the issue of doing a great GUI project while talking to
qemu-devel about his needs.
 

Have you made thoughts about why that might be so?

I think it's because of what i outlined above - that you are trying to apply
the "UNIX secret" to GUIs - and that is a mistake.

A good GUI is almost at the _exact opposite spectrum_ of good command-line
tool: tightly integrated, with 'layering violations' designed into it all over
the place:

   look i can paste the text from an editor straight into a firefox form. I
   didnt go through any hiearchy of layers, i just took the shortest path
   between the apps!
   


Nope.  You copied text from one application into the clipboard (or 
selection, or PRIMARY, or whatever
) and pasted text from the clipboard to another application.  If firefox 
and your editor had to interact directly, all would be lost.


See - there was a global (for the session) third party, and it wasn't 
the kernel.



In other words: in a GUI the output controls the design, for command-line
tools the design controls the output.
   


Not in GUIs that I've seen the internals of.


It is no wonder Unix always had its problems with creating good GUIs that are
efficient to humans. A good GUI works like the human brain, and the human
brain does not mind 'layering violations' when that gets it a more efficient
result.
   


The problem is that only developers are involved, not people who 
understand human-computer interaction (in many cases, not human-human 
interaction either).  Another problem is that a good GUI takes a lot of 
work so you need a lot of committed resources.  A third problem is that 
it isn't a lot of fun, at least not the 20% of the work that take 800% 
of the time.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Ingo Molnar

* Avi Kivity  wrote:

> On 03/22/2010 09:22 PM, Ingo Molnar wrote:
> >
> >> Transitive had a product that was using a KVM context to run their binary 
> >> translator which allowed them full access to the host processes virtual 
> >> address space range.  In this case, there is no kernel and there are no 
> >> devices.
> >
> > And your point is that such vcpus should be excluded from profiling just 
> > because they fall outside the Qemu/libvirt umbrella?
> >
> > That is a ridiculous position.
> >
> 
> Non-guest vcpus will not be able to provide Linux-style symbols.

And why do you say that it makes no sense to profile them?

Also, why do you define 'guest vcpus' to be 'Qemu started guest vcpus'? If 
some other KVM using project (which you encouraged just a few mails ago) 
starts a vcpu we still want to be able to profile them.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Ingo Molnar

* Anthony Liguori  wrote:

> On 03/22/2010 02:22 PM, Ingo Molnar wrote:
> >>Transitive had a product that was using a KVM context to run their
> >>binary translator which allowed them full access to the host
> >>processes virtual address space range.  In this case, there is no
> >>kernel and there are no devices.
> >
> > And your point is that such vcpus should be excluded from profiling just 
> > because they fall outside the Qemu/libvirt umbrella?
> 
> You don't instrument it the way you'd instrument an operating system so no, 
> you don't want it to show up in perf kvm top.

Erm, why not? It's executing a virtualized CPU, so sure it makes sense to 
allow the profiling of it!

It might even not be the weird case you mentioned by some competing 
virtualization project to Qemu ...

So your argument is wrong on several technical levels, sorry.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Ingo Molnar

* Avi Kivity  wrote:

> > I think you didnt understand my point. I am talking about 'perf kvm top' 
> > hanging if Qemu hangs.
> 
> Use non-blocking I/O, report that guest as dead.  No point in profiling it, 
> it isn't making any progress.

Erm, at what point do i decide that a guest is 'dead' versus 'just lagged due 
to lots of IO' ?

Also, do you realize that you increase complexity (the use of non-blocking 
IO), just to protect against something that wouldnt happen if the right 
solution was used in the first place?

> > With a proper in-kernel enumeration the kernel would always guarantee the 
> > functionality, even if the vcpu does not make progress (i.e. it's "hung").
> >
> > With this implemented in Qemu we lose that kind of robustness guarantee.
> 
> If qemu has a bug in the resource enumeration code, you can't profile one 
> guest.  If the kernel has a bug in the resource enumeration code, the system 
> either panics or needs to be rebooted later.

This is really simple code, not rocket science. If there's a bug in it we'll 
fix it. On the other hand a 500KLOC+ piece of Qemu code has lots of places to 
hang, so that is a large cross section.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Ingo Molnar

* Alexander Graf  wrote:

> > Furthermore, another negative effect is that many times features are 
> > implemented not in their technically best way, but in a way to keep them 
> > local to the project that originates them. This is done to keep deployment 
> > latencies and general contribution overhead down to a minimum. The moment 
> > you have to work with yet another project, the overhead adds up.
> 
> I disagree there. Keeping things local and self-contained has been the UNIX 
> secret. It works really well as long as the boundaries are well defined.

The 'UNIX secret' works for text driven pipelined commands where we are 
essentially programming via narrow ASCII input of mathematical logic.

It doesnt work for a GUI that is a 2D/3D environment of millions of pixels, 
shaped by human visual perception of prettiness, familiarity and efficiency.

> The problem we're facing is that we're simply lacking an active GUI / 
> desktop user development community. We have desktop users, but nobody feels 
> like tackling the issue of doing a great GUI project while talking to 
> qemu-devel about his needs.

Have you made thoughts about why that might be so?

I think it's because of what i outlined above - that you are trying to apply 
the "UNIX secret" to GUIs - and that is a mistake.

A good GUI is almost at the _exact opposite spectrum_ of good command-line 
tool: tightly integrated, with 'layering violations' designed into it all over 
the place:

  look i can paste the text from an editor straight into a firefox form. I
  didnt go through any hiearchy of layers, i just took the shortest path 
  between the apps!

In other words: in a GUI the output controls the design, for command-line 
tools the design controls the output.

It is no wonder Unix always had its problems with creating good GUIs that are 
efficient to humans. A good GUI works like the human brain, and the human 
brain does not mind 'layering violations' when that gets it a more efficient 
result.

> > So developers rather go for the quicker (yet inferior) hack within the 
> > sub-project they have best access to.
> 
> Well - not necessarily hacks. It's more about project boundaries. Nothing is 
> bad about that. You wouldn't want "ls" implemented in the Linux kernel 
> either, right? :-)

I guess you are talking to the wrong person as i actually have implemented ls 
functionality in the kernel, using async IO concepts and extreme threading ;-) 
It was a bit crazy, but was also the fastest FTP server ever running on this 
planet.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Antoine Martin

On 03/23/2010 02:54 AM, Ingo Molnar wrote:

* Alexander Graf  wrote

Yes. I think the point was that every layer in between brings potential
slowdown and loss of features.
 

Exactly. The more 'fragmented' a project is into sub-projects, without a
single, unified, functional reference implementation in the center of it, the
longer it takes to fix 'unsexy' problems like trivial usability bugs.

Furthermore, another negative effect is that many times features are
implemented not in their technically best way, but in a way to keep them local
to the project that originates them. This is done to keep deployment latencies
and general contribution overhead down to a minimum. The moment you have to
work with yet another project, the overhead adds up.

So developers rather go for the quicker (yet inferior) hack within the
sub-project they have best access to.

Tell me this isnt happening in this space ;-)
   
Integration is hard, requires a wider set of technical skills and 
getting good test coverage becomes more difficult.
But I agree that it is worth the effort, kvm could reap large rewards 
from putting a greater emphasis on integration (ala vbox) - no matter 
how it is achieved (cowardly not taking sides on implementation 
decisions like repository locations).


Antoine


Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Avi Kivity

On 03/22/2010 10:06 PM, Ingo Molnar wrote:

* Avi Kivity  wrote:

   

On 03/22/2010 09:20 PM, Ingo Molnar wrote:
 

* Avi Kivity   wrote:

   

Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by
Anthony. There's numerous ways that this can break:
   

I don't like it either.  We have libvirt for enumerating guests.
 

Which has pretty much the same problems to the ${HOME}/.qemu/qmp/ solution,
obviously.
   

It doesn't follow.  The libvirt daemon could/should own guests from all
users.  I don't know if it does so now, but nothing is preventing it
technically.
 

It's hard for me to argue against a hypothetical implementation, but all
user-space driven solutions for resource enumeration i've seen so far had
weaknesses that kernel-based solutions dont have.
   


Correct.  kernel-based solutions also have issues.


If qemu hangs, the guest hangs a few milliseconds later.
 

I think you didnt understand my point. I am talking about 'perf kvm top'
hanging if Qemu hangs.
   


Use non-blocking I/O, report that guest as dead.  No point in profiling 
it, it isn't making any progress.



With a proper in-kernel enumeration the kernel would always guarantee the
functionality, even if the vcpu does not make progress (i.e. it's "hung").

With this implemented in Qemu we lose that kind of robustness guarantee.
   


If qemu has a bug in the resource enumeration code, you can't profile 
one guest.  If the kernel has a bug in the resource enumeration code, 
the system either panics or needs to be rebooted later.



And especially during development (when developers use instrumentation the
most) is it important to have robust instrumentation that does not hang along
with the Qemu process.
   


It's nice not to have kernel oopses either.  So when code can be in 
userspace, that's where it should be.



If qemu fails, you lose your guest.  If libvirt forgets about a
guest, you can't do anything with it any more.  These are more
serious problems than 'perf kvm' not working. [...]
 

How on earth can you justify a bug ("perf kvm top" hanging) with that there
are other bugs as well?
   


There's no reason for 'perf kvm top' to hang if some process is not 
responsive.  That would be a perf bug.



Basically you are arguing the equivalent that a gdb session would be fine to
become unresponsive if the debugged task hangs. Fortunately ptrace is
kernel-based and it never 'hangs' if the user-space process hangs somewhere.
   


Neither gdb nor perf should hang.


This is an essential property of good instrumentation.

So the enumeration method you suggested is a poor, sub-part solution, simple
as that.
   


Or, you misunderstood it.


[...] Qemu and libvirt have to be robust anyway, we can rely on them.  Like
we have to rely on init, X, sshd, and a zillion other critical tools.
 

We can still profile any of those tools without the profiler breaking if the
debugged tool breaks ...
   


You can't profile without qemu.


By your argument it would be perfectly fine to implement /proc purely via
user-space, correct?
   

I would have preferred /proc to be implemented via syscalls called directly
from tools, and good tools written to expose the information in it.  When
computers were slower 'top' would spend tons of time opening and closing all
those tiny files and parsing them.  Of course the kernel needs to provide
the information.
 

(Then you'll be enjoyed to hear that perf has enabled exactly that, and that we
are working towards that precise usecase.)
   


Are you exporting /proc/pid data via the perf syscall?  If so, I think 
that's a good move.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Ingo Molnar

* Avi Kivity  wrote:

> On 03/22/2010 09:20 PM, Ingo Molnar wrote:
> >* Avi Kivity  wrote:
> >
> >>>Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by
> >>>Anthony. There's numerous ways that this can break:
> >>I don't like it either.  We have libvirt for enumerating guests.
> >Which has pretty much the same problems to the ${HOME}/.qemu/qmp/ solution,
> >obviously.
> 
> It doesn't follow.  The libvirt daemon could/should own guests from all 
> users.  I don't know if it does so now, but nothing is preventing it 
> technically.

It's hard for me to argue against a hypothetical implementation, but all 
user-space driven solutions for resource enumeration i've seen so far had 
weaknesses that kernel-based solutions dont have.

> >>>  - Those special files can get corrupted, mis-setup, get out of sync, or 
> >>> can
> >>>be hard to discover.
> >>>
> >>>  - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
> >>>design flaw: it is per user. When i'm root i'd like to query _all_ 
> >>> current
> >>>guest images, not just the ones started by root. A system might not 
> >>> even
> >>>have a notion of '${HOME}'.
> >>>
> >>>  - Apps might start KVM vcpu instances without adhering to the
> >>>${HOME}/.qemu/qmp/ access method.
> >>- it doesn't work with nfs.
> >So out of a list of 4 disadvantages your reply is that you agree with 3?
> 
> I agree with 1-3, disagree with 4, and add 5.  Yes.
> 
> >>>  - There is no guarantee for the Qemu process to reply to a request - 
> >>> while
> >>>the kernel can always guarantee an enumeration result. I dont want 
> >>> 'perf
> >>>kvm' to hang or misbehave just because Qemu has hung.
> >>If qemu doesn't reply, your guest is dead anyway.
> >Erm, but i'm talking about a dead tool here. There's a world of a difference
> >between 'kvm top' not showing new entries (because the guest is dead), and
> >'perf kvm top' hanging due to Qemu hanging.
> 
> If qemu hangs, the guest hangs a few milliseconds later.

I think you didnt understand my point. I am talking about 'perf kvm top' 
hanging if Qemu hangs.

With a proper in-kernel enumeration the kernel would always guarantee the 
functionality, even if the vcpu does not make progress (i.e. it's "hung").

With this implemented in Qemu we lose that kind of robustness guarantee.

And especially during development (when developers use instrumentation the 
most) is it important to have robust instrumentation that does not hang along 
with the Qemu process.

> If qemu fails, you lose your guest.  If libvirt forgets about a
> guest, you can't do anything with it any more.  These are more
> serious problems than 'perf kvm' not working. [...]

How on earth can you justify a bug ("perf kvm top" hanging) with that there 
are other bugs as well?

Basically you are arguing the equivalent that a gdb session would be fine to 
become unresponsive if the debugged task hangs. Fortunately ptrace is 
kernel-based and it never 'hangs' if the user-space process hangs somewhere.

This is an essential property of good instrumentation.

So the enumeration method you suggested is a poor, sub-part solution, simple 
as that.

> [...] Qemu and libvirt have to be robust anyway, we can rely on them.  Like 
> we have to rely on init, X, sshd, and a zillion other critical tools.

We can still profile any of those tools without the profiler breaking if the 
debugged tool breaks ...

> > By your argument it would be perfectly fine to implement /proc purely via 
> > user-space, correct?
> 
> I would have preferred /proc to be implemented via syscalls called directly 
> from tools, and good tools written to expose the information in it.  When 
> computers were slower 'top' would spend tons of time opening and closing all 
> those tiny files and parsing them.  Of course the kernel needs to provide 
> the information.

(Then you'll be enjoyed to hear that perf has enabled exactly that, and that we 
are working towards that precise usecase.)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Antoine Martin

On 03/23/2010 02:15 AM, Anthony Liguori wrote:

On 03/22/2010 12:55 PM, Avi Kivity wrote:
Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by 
Anthony.

There's numerous ways that this can break:


I don't like it either.  We have libvirt for enumerating guests.


We're stuck in a rut with libvirt and I think a lot of the 
dissatisfaction with qemu is rooted in that.  It's not libvirt that's 
the probably, but the relationship between qemu and libvirt.

+1
The obvious reason why so many people still use shell scripts rather 
than libvirt is because if it just doesn't provide what they need.
Every time I've looked at it (and I've been looking for a better 
solution for many years), it seems that it would have provided most of 
the things I needed, but the remaining bits were unsolvable.


Shell scripts can be ugly, but you get total control.

Antoine
We add a feature to qemu and maybe after six month it gets exposed by 
libvirt.  Release time lines of the two projects complicate the 
situation further.  People that write GUIs are limited by libvirt 
because that's what they're told to use and when they need something 
simple, they're presented with first getting that feature implemented 
in qemu, then plumbed through libvirt.


It wouldn't be so bad if libvirt was basically a passthrough interface 
to qemu but it tries to model everything in a generic way which is 
more or less doomed to fail when you're adding lots of new features 
(as we are).


The list of things that libvirt doesn't support and won't any time 
soon is staggering.


libvirt serves an important purpose, but we need to do a better job in 
qemu with respect to usability.  We can't just punt to libvirt.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Alexander Graf

On 22.03.2010, at 20:54, Ingo Molnar wrote:

> 
> * Alexander Graf  wrote:
> 
>> Yes. I think the point was that every layer in between brings potential 
>> slowdown and loss of features.
> 
> Exactly. The more 'fragmented' a project is into sub-projects, without a 
> single, unified, functional reference implementation in the center of it, the 
> longer it takes to fix 'unsexy' problems like trivial usability bugs.

I agree to that part. As previously stated there are few people working on qemu 
that would go and implement higher level things though. So some solution is 
needed there.

> Furthermore, another negative effect is that many times features are 
> implemented not in their technically best way, but in a way to keep them 
> local 
> to the project that originates them. This is done to keep deployment 
> latencies 
> and general contribution overhead down to a minimum. The moment you have to 
> work with yet another project, the overhead adds up.

I disagree there. Keeping things local and self-contained has been the UNIX 
secret. It works really well as long as the boundaries are well defined.

The problem we're facing is that we're simply lacking an active GUI / desktop 
user development community. We have desktop users, but nobody feels like 
tackling the issue of doing a great GUI project while talking to qemu-devel 
about his needs.

> So developers rather go for the quicker (yet inferior) hack within the 
> sub-project they have best access to.

Well - not necessarily hacks. It's more about project boundaries. Nothing is 
bad about that. You wouldn't want "ls" implemented in the Linux kernel either, 
right? :-)


Alex--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Ingo Molnar

* Alexander Graf  wrote:

> Yes. I think the point was that every layer in between brings potential 
> slowdown and loss of features.

Exactly. The more 'fragmented' a project is into sub-projects, without a 
single, unified, functional reference implementation in the center of it, the 
longer it takes to fix 'unsexy' problems like trivial usability bugs.

Furthermore, another negative effect is that many times features are 
implemented not in their technically best way, but in a way to keep them local 
to the project that originates them. This is done to keep deployment latencies 
and general contribution overhead down to a minimum. The moment you have to 
work with yet another project, the overhead adds up.

So developers rather go for the quicker (yet inferior) hack within the 
sub-project they have best access to.

Tell me this isnt happening in this space ;-)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Ingo Molnar

* Joerg Roedel  wrote:

> On Mon, Mar 22, 2010 at 05:32:15PM +0100, Ingo Molnar wrote:
> > I dont know how you can find the situation of Alpha comparable, which is a 
> > legacy architecture for which no new CPU was manufactored in the past ~10 
> > years.
> > 
> > The negative effects of physical obscolescence cannot be overcome even by 
> > the 
> > very best of development models ...
> 
> The maintainers of that architecture could at least continue to maintain it. 
> But that is not the case. Most newer syscalls are not available and overall 
> stability on alpha sucks (kernel crashed when I tried to start Xorg for 
> example) but nobody cares about it. Hardware is still around and there are 
> still some users of it.

You are arguing why maintainers do not act as you suggest, against the huge 
negative effects of physical obscolescence?

Please use common sense: they dont act because ... there are huge negative 
effects due to physical obscolescence?

No amount of development model engineering can offset that negative.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Avi Kivity

On 03/22/2010 09:27 PM, Ingo Molnar wrote:



If your position basically boils down to, we can't trust userspace
and we can always trust the kernel, I want to eliminate any
userspace path, then I can't really help you out.
 

Why would you want to 'help me out'? I can tell a good solution from a bad one
just fine.
   


You are basically making a kernel implementation a requirement, instead 
of something that follows from the requirement.



You should instead read the long list of disadvantages above, invert them and
list then as advantages for the kernel-based vcpu enumeration solution, apply
common sense and go admit to yourself that indeed in this situation a kernel
provided enumeration of vcpu contexts is the most robust solution.
   


Having qemu enumerate guests one way or another is not a good idea IMO 
since it is focused on one guest and doesn't have a system-wide entity.  
A userspace system-wide entity will work just as well as kernel 
implementation, without its disadvantages.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Avi Kivity

On 03/22/2010 09:22 PM, Ingo Molnar wrote:



Transitive had a product that was using a KVM context to run their
binary translator which allowed them full access to the host
processes virtual address space range.  In this case, there is no
kernel and there are no devices.
 

And your point is that such vcpus should be excluded from profiling just
because they fall outside the Qemu/libvirt umbrella?

That is a ridiculous position.

   


Non-guest vcpus will not be able to provide Linux-style symbols.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Avi Kivity

On 03/22/2010 09:20 PM, Ingo Molnar wrote:

* Avi Kivity  wrote:

   

Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by
Anthony. There's numerous ways that this can break:
   

I don't like it either.  We have libvirt for enumerating guests.
 

Which has pretty much the same problems to the ${HOME}/.qemu/qmp/ solution,
obviously.
   


It doesn't follow.  The libvirt daemon could/should own guests from all 
users.  I don't know if it does so now, but nothing is preventing it 
technically.



  - Those special files can get corrupted, mis-setup, get out of sync, or can
be hard to discover.

  - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
design flaw: it is per user. When i'm root i'd like to query _all_ current
guest images, not just the ones started by root. A system might not even
have a notion of '${HOME}'.

  - Apps might start KVM vcpu instances without adhering to the
${HOME}/.qemu/qmp/ access method.
   

- it doesn't work with nfs.
 

So out of a list of 4 disadvantages your reply is that you agree with 3?
   


I agree with 1-3, disagree with 4, and add 5.  Yes.

   

  - There is no guarantee for the Qemu process to reply to a request - while
the kernel can always guarantee an enumeration result. I dont want 'perf
kvm' to hang or misbehave just because Qemu has hung.
   

If qemu doesn't reply, your guest is dead anyway.
 

Erm, but i'm talking about a dead tool here. There's a world of a difference
between 'kvm top' not showing new entries (because the guest is dead), and
'perf kvm top' hanging due to Qemu hanging.
   


If qemu hangs, the guest hangs a few milliseconds later.


So it's essentially 4 our of 4. Yet your reply isnt "Ingo you are right" but
"hey, too bad" ?
   


My reply is "you are right" (phrased earlier as "I don't like it either" 
meaning I agree with your dislike).  One of your criticisms was invalid, 
IMO, and I pointed it out.



Really, for such reasons user-space is pretty poor at doing system-wide
enumeration and resource management. Microkernels lost for a reason.
   

Take a look at your desktop, userspace is doing all of that everywhere, from
enumerating users and groups, to deciding how your disks are named.  The
kernel only provides the bare facilities.
 

We dont do that for robust system instrumentation, for heaven's sake!
   


If qemu fails, you lose your guest.  If libvirt forgets about a guest, 
you can't do anything with it any more.  These are more serious problems 
than 'perf kvm' not working.  Qemu and libvirt have to be robust anyway, 
we can rely on them.  Like we have to rely on init, X, sshd, and a 
zillion other critical tools.



By your argument it would be perfectly fine to implement /proc purely via
user-space, correct?
   


I would have preferred /proc to be implemented via syscalls called 
directly from tools, and good tools written to expose the information in 
it.  When computers were slower 'top' would spend tons of time opening 
and closing all those tiny files and parsing them.  Of course the kernel 
needs to provide the information.



You are committing several grave design mistakes here.
   

I am committing on the shoulders of giants.
 

Really, this is getting outright ridiculous. You agree with me that Anothony
suggested a technically inferior solution, yet you even seem to be proud of it
and are joking about it?
   


The bit above this was:


 Really, for such reasons user-space is pretty poor at doing system-wide
 enumeration and resource management. Microkernels lost for a reason.
 


In every Linux system userspace is doing or proxying much of the 
enumeration and resource management.  So if enumerating guests in 
userspace is a mistake, then I am not alone in making it.



And _you_ are complaining about lkml-style hard-talk discussions?
   


There is a difference between joking and insulting people.  I enjoy 
jokes but I dislike being insulted.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Alexander Graf

On 22.03.2010, at 20:31, Daniel P. Berrange wrote:

> On Mon, Mar 22, 2010 at 02:15:35PM -0500, Anthony Liguori wrote:
>> On 03/22/2010 12:55 PM, Avi Kivity wrote:
 Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by 
 Anthony.
 There's numerous ways that this can break:
>>> 
>>> I don't like it either.  We have libvirt for enumerating guests.
>> 
>> We're stuck in a rut with libvirt and I think a lot of the 
>> dissatisfaction with qemu is rooted in that.  It's not libvirt that's 
>> the probably, but the relationship between qemu and libvirt.
>> 
>> We add a feature to qemu and maybe after six month it gets exposed by 
>> libvirt.  Release time lines of the two projects complicate the 
>> situation further.  People that write GUIs are limited by libvirt 
>> because that's what they're told to use and when they need something 
>> simple, they're presented with first getting that feature implemented in 
>> qemu, then plumbed through libvirt.
> 
> That is somewhat unfair as a blanket statement! 
> 
> While some features have had a long time delay & others are not supported
> at all, in many cases we have had zero delay. We have been supporting QMP,
> qdev, vhost-net since before the patches for those features were even merged
> in QEMU GIT! It varies depending on how closely QEMU & libvirt people have
> been working together on a feature, and on how strongly end users are 
> demanding
> the features. 

Yes. I think the point was that every layer in between brings potential 
slowdown and loss of features.

Hopefully this will go away with QMP. By then people can decide if they want to 
be hypervisor agnostic (libvirt) or tightly coupled with qemu (QMP). The best 
of both worlds would of course be a QMP pass-through in libvirt. No idea if 
that's easily possible.

Either way, things are improving. What people see at the end is virt-manager 
though. And if you compare if feature-wise as well as looks-wise vbox is simply 
superior. Several features lacking in lower layers too (pv graphics, always 
working absolute pointers, clipboard sharing, ...).

That said it doesn't mean we should resign. It means we know which areas to 
work on :-). And we know that our problem is not the kernel/userspace 
interface, but the qemu and above interfaces.

Alex--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Anthony Liguori

On 03/22/2010 02:31 PM, Daniel P. Berrange wrote:

On Mon, Mar 22, 2010 at 02:15:35PM -0500, Anthony Liguori wrote:
   

On 03/22/2010 12:55 PM, Avi Kivity wrote:
 

Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by
Anthony.
There's numerous ways that this can break:
 

I don't like it either.  We have libvirt for enumerating guests.
   

We're stuck in a rut with libvirt and I think a lot of the
dissatisfaction with qemu is rooted in that.  It's not libvirt that's
the probably, but the relationship between qemu and libvirt.

We add a feature to qemu and maybe after six month it gets exposed by
libvirt.  Release time lines of the two projects complicate the
situation further.  People that write GUIs are limited by libvirt
because that's what they're told to use and when they need something
simple, they're presented with first getting that feature implemented in
qemu, then plumbed through libvirt.
 

That is somewhat unfair as a blanket statement!
   


Sorry, you're certainly correct.  Some features appear quickly, but 
others can take an awfully long time.



It wouldn't be so bad if libvirt was basically a passthrough interface
to qemu but it tries to model everything in a generic way which is more
or less doomed to fail when you're adding lots of new features (as we are).

The list of things that libvirt doesn't support and won't any time soon
is staggering.
 

As previously discussed, we want to improve both the set of features
supported, and make it much easier to support new features promptly.
The QMP&  qdev stuff has been a very good step forward in making it
easier to support QEMU management. There have been a proposals from
several people, yourself included, on how to improve libvirt's support
for the full range of QEMU features. We're committed to looking at this
and figuring out which proposals are practical to support, so we can
improve QEMU&  libvirt interaction for everyone.
   


Regards,

Anthony Liguori


Regards,
Daniel
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Daniel P. Berrange
On Mon, Mar 22, 2010 at 02:15:35PM -0500, Anthony Liguori wrote:
> On 03/22/2010 12:55 PM, Avi Kivity wrote:
> >>Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by 
> >>Anthony.
> >>There's numerous ways that this can break:
> >
> >I don't like it either.  We have libvirt for enumerating guests.
> 
> We're stuck in a rut with libvirt and I think a lot of the 
> dissatisfaction with qemu is rooted in that.  It's not libvirt that's 
> the probably, but the relationship between qemu and libvirt.
> 
> We add a feature to qemu and maybe after six month it gets exposed by 
> libvirt.  Release time lines of the two projects complicate the 
> situation further.  People that write GUIs are limited by libvirt 
> because that's what they're told to use and when they need something 
> simple, they're presented with first getting that feature implemented in 
> qemu, then plumbed through libvirt.

That is somewhat unfair as a blanket statement! 

While some features have had a long time delay & others are not supported
at all, in many cases we have had zero delay. We have been supporting QMP,
qdev, vhost-net since before the patches for those features were even merged
in QEMU GIT! It varies depending on how closely QEMU & libvirt people have
been working together on a feature, and on how strongly end users are demanding
the features. 

> It wouldn't be so bad if libvirt was basically a passthrough interface 
> to qemu but it tries to model everything in a generic way which is more 
> or less doomed to fail when you're adding lots of new features (as we are).
> 
> The list of things that libvirt doesn't support and won't any time soon 
> is staggering.

As previously discussed, we want to improve both the set of features
supported, and make it much easier to support new features promptly.
The QMP & qdev stuff has been a very good step forward in making it
easier to support QEMU management. There have been a proposals from 
several people, yourself included, on how to improve libvirt's support
for the full range of QEMU features. We're committed to looking at this
and figuring out which proposals are practical to support, so we can
improve QEMU & libvirt interaction for everyone.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Avi Kivity

On 03/22/2010 09:20 PM, Joerg Roedel wrote:



Why doesnt it solve the bisectability problem? The kernel repo is supposed to
be bisectable so that problem would be solved.
 

Because Marcelo and Avi try to keep as close to upstream qemu as
possible. So the qemu repo is regularly merged in qemu-kvm and if you
want to bisect you may end up somewhere in the middle of the qemu
repository which has only very minimal kvm-support.
The problem here is that two qemu repositorys exist. But the current
effort of Anthony is directed to create a single qemu repository. But
thats not done overnight.
   


It's in fact possible to bisect qemu-kvm.git.  If you end up in 
qemu.git, do a 'git bisect skip'.  If you end up in a merge, call the 
merge point A, bisect A^1..A^2, each time merging A^1 before compiling 
(the merge is always trivial due to the way we do it).


Not fun, but it works.  When we complete merging kvm integration into 
qemu.git, this problem will disappear.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Anthony Liguori

On 03/22/2010 02:22 PM, Ingo Molnar wrote:

Transitive had a product that was using a KVM context to run their
binary translator which allowed them full access to the host
processes virtual address space range.  In this case, there is no
kernel and there are no devices.
 

And your point is that such vcpus should be excluded from profiling just
because they fall outside the Qemu/libvirt umbrella?
   


You don't instrument it the way you'd instrument an operating system so 
no, you don't want it to show up in perf kvm top.


Regards,

Anthony Liguori


Ingo
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Andrea Arcangeli
On Mon, Mar 22, 2010 at 08:10:28PM +0100, Ingo Molnar wrote:
> I posit that it's both: and that priorities can be communicated - if only you 
> try as a maintainer. All i'm suggesting is to add 'usable, unified 
> user-space' 
> to the list of unfun priorities, because it's possible and because it matters.

IMHO blaming anybody for it but qemu maintainership is very
unfair. They intentionally reinveinted a less self contained,
inferior, underperforming, underfeatured wheel instead of doing the
right thing and just making sure that it as self contained enough as
possible to avoid risking destabilizing their existing codebase. What
can anybody (without qemu git commit access) do about it unless qemu
git maintainer change attitude, dumps its qemu/kvm-all.c nosense for
good, and do the right thing so we can unify for real?

We need to move forward, including multithread the qemu core and be
ready to include desktop virtualization protocol when they're ready
for submission without being suggested to extend vnc instead to gain a
similar speedup (i.e. yet another inferior wheel).

Unification means that _all_ qemu users, pure research, theoretical
interest, Xen, virtualbox, weird pure software architecture, will be
able to push their stuff in for the common good, but that also shall
apply to KVM! It has to become clear that reinveinting inferior wheels
instead of merging the real thing, is absolutely time wasteful,
unnecessary, and it won't make any difference as far as KVM is
concerned, proof is that 0% of userbase runs qemu git to run KVM
(except the kvm-all.c developers to test it perhaps or somebody by
mistake not adding -kvm prefix to command line maybe). I don't pretend
to rate KVM as more important than all the rest of niche usages for
qemu but it shall be _as_ important as the rest and it'd be nice one
day to be able to install only qemu on a system and get something
actually usable in production.

I very much like that qemu gets contributions from everywhere, it's
also nice it can run without KVM (that is purely useful as a
debugging tool to me but still...). I think it can all happen and
unification should be the object for the gain of everyone in both
qemu/kvm and even xen and all the rest.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Ingo Molnar

* Anthony Liguori  wrote:

> On 03/22/2010 12:34 PM, Ingo Molnar wrote:
> >This is really just the much-discredited microkernel approach for keeping
> >global enumeration data that should be kept by the kernel ...
> >
> >Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by Anthony.
> >There's numerous ways that this can break:
> >
> >  - Those special files can get corrupted, mis-setup, get out of sync, or can
> >be hard to discover.
> >
> >  - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
> >design flaw: it is per user. When i'm root i'd like to query _all_ 
> > current
> >guest images, not just the ones started by root. A system might not even
> >have a notion of '${HOME}'.
> >
> >  - Apps might start KVM vcpu instances without adhering to the
> >${HOME}/.qemu/qmp/ access method.
> >
> >  - There is no guarantee for the Qemu process to reply to a request - while
> >the kernel can always guarantee an enumeration result. I dont want 'perf
> >kvm' to hang or misbehave just because Qemu has hung.
> 
> If your position basically boils down to, we can't trust userspace
> and we can always trust the kernel, I want to eliminate any
> userspace path, then I can't really help you out.

Why would you want to 'help me out'? I can tell a good solution from a bad one 
just fine.

You should instead read the long list of disadvantages above, invert them and 
list then as advantages for the kernel-based vcpu enumeration solution, apply 
common sense and go admit to yourself that indeed in this situation a kernel 
provided enumeration of vcpu contexts is the most robust solution.

It's really as simple as that :-)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Avi Kivity

On 03/22/2010 09:10 PM, Ingo Molnar wrote:

* Avi Kivity  wrote:

   

On 03/22/2010 06:32 PM, Ingo Molnar wrote:
 

So, what do you think creates code communities and keeps them alive?
Developers and code. And the wellbeing of developers are primarily
influenced by the repository structure and by the development/maintenance
process - i.e. by the 'fun' aspect. (i'm simplifying things there but
that's the crux of it.)
   

There is nothing fun about having one repository or two.  Who cares about
this anyway?

tools/kvm/ probably will draw developers, simply because of the glory
associated with kernel work.  That's a bug, not a feature.  It means that
effort is not distributed according to how it's needed, but because of
irrelevant considerations.
 

And yet your solution to that is to ... do all your work in the kernel space
and declare the tooling as something that does not interest you? ;-)
   


I have done plenty of userspace work in qemu.  I don't have a lack of 
interest in qemu, just in a desktop GUI.  I'm not a GUI person and my 
employer doesn't have a desktop-on-desktop virtualization product that I 
know of.



Something I've wanted for a long time is to port kvm_stat to use tracepoints
instead of the home-grown instrumentation.  But that is unrelated to this
new tracepoint.  Other than that we're satisfied with ftrace.
 

Despite it being another in-kernel subsystem that by your earlier arguments
should be done via a user-space package? ;-)
   


I'm satisfied with it as a user.  Architecturally, I'd have preferred it 
to be a userspace tool.  It might have improved usability as well to 
have something with --help instead of a set of debugfs files.  But I'm a 
lot happier with ftrace existing as a kernel component than not at all.



You should realize that naturally developers will gravitate towards the
most 'fun' aspects of a project. It is the task of the maintainer to keep
the balance between fun and utility, bugs and features, quality and
code-rot.
   

There are plenty of un-fun tasks (like fixing bugs and providing RAS
features) that we're doing.  We don't do this for fun but to satisfy our
users.
 

So which one is it, KVM developers are volunteers that do fun stuff and cannot
be told about project priorities, or KVM developers are pros who do unfun
stuff because they can be told about priorities?
   


From my point of view as maintainer, all contributors are volunteers, I 
can't tell any of them what to do.  From the point of view of many of 
these volunteer's employers, they are wage slaves who do as they're told 
or else.


So: when someone sends me a patch I gratefully accept if it is good or 
point out the issues if not.  At the secret Red Hat headquarters and the 
kvm weekly conference call I participate in deciding priorities and task 
assignments.



I posit that it's both: and that priorities can be communicated - if only you
try as a maintainer. All i'm suggesting is to add 'usable, unified user-space'
to the list of unfun priorities, because it's possible and because it matters.
   


So: I require a volunteer to write some GUI code before I accept a 
patch.  Back at the Red Hat lair, we think of what features we drop from 
the product because the kvm maintainer has gone nuts.


The 'unified' part of your suggestion is not a requirement, but an 
implementation detail.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Ingo Molnar

* Anthony Liguori  wrote:

> On 03/22/2010 12:34 PM, Ingo Molnar wrote:
> >* Avi Kivity  wrote:
> >
> >  - Easy default reference to guest instances, and a way for tools to
> >reference them symbolically as well in the multi-guest case. 
> > Preferably
> >something trustable and kernel-provided - not some indirect 
> > information
> >like a PID file created by libvirt-manager or so.
> Usually 'layering violation' is trotted out at such suggestions.
> [...]
> >>>That's weird, how can a feature request be a 'layering violation'?
> >>The 'something trustable and kernel-provided'.  The kernel knows nothing
> >>about guest names.
> >The kernel certainly knows about other resources such as task names or 
> >network
> >interface names or tracepoint names. This is kernel design 101.
> >
> >>>If something that users find straightforward and usable is a layering
> >>>violation to you (such as easily being able to access their own files on
> >>>the host as well ...) then i think you need to revisit the definition of
> >>>that term instead of trying to fix the user.
> >>Here is the explanation, you left it quoted:
> >>
> [...]  I don't like using the term, because sometimes the layers are
> incorrect and need to be violated.  But it should be done explicitly, not
> as a shortcut for a minor feature (and profiling is a minor feature, most
> users will never use it, especially guest-from-host).
> 
> The fact is we have well defined layers today, kvm virtualizes the cpu
> and memory, qemu emulates devices for a single guest, libvirt manages
> guests. We break this sometimes but there has to be a good reason.  So
> perf needs to talk to libvirt if it wants names.  Could be done via
> linking, or can be done using a pluging libvirt drops into perf.
> >This is really just the much-discredited microkernel approach for keeping
> >global enumeration data that should be kept by the kernel ...
> >
> >Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by Anthony.
> >There's numerous ways that this can break:
> >
> >  - Those special files can get corrupted, mis-setup, get out of sync, or can
> >be hard to discover.
> >
> >  - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
> >design flaw: it is per user. When i'm root i'd like to query _all_ 
> > current
> >guest images, not just the ones started by root. A system might not even
> >have a notion of '${HOME}'.
> >
> >  - Apps might start KVM vcpu instances without adhering to the
> >${HOME}/.qemu/qmp/ access method.
> 
> Not all KVM vcpus are running operating systems.

But we want to allow developers to instrument all of them ...

> Transitive had a product that was using a KVM context to run their
> binary translator which allowed them full access to the host
> processes virtual address space range.  In this case, there is no
> kernel and there are no devices.

And your point is that such vcpus should be excluded from profiling just 
because they fall outside the Qemu/libvirt umbrella?

That is a ridiculous position.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Joerg Roedel
On Mon, Mar 22, 2010 at 05:32:15PM +0100, Ingo Molnar wrote:
> I dont know how you can find the situation of Alpha comparable, which is a 
> legacy architecture for which no new CPU was manufactored in the past ~10 
> years.
> 
> The negative effects of physical obscolescence cannot be overcome even by the 
> very best of development models ...

The maintainers of that architecture could at least continue to maintain
it. But that is not the case. Most newer syscalls are not available and
overall stability on alpha sucks (kernel crashed when I tried to start
Xorg for example) but nobody cares about it. Hardware is still around
and there are still some users of it.

> > > * Joerg Roedel  wrote:
> > No, the split-repository situation was the smallest problem after all. Its 
> > was a community thing. If the community doesn't work a single-repo project 
> > will also fail. [...]
> 
> So, what do you think creates code communities and keeps them alive? 
> Developers and code. And the wellbeing of developers are primarily influenced 
> by the repository structure and by the development/maintenance process - i.e. 
> by the 'fun' aspect. (i'm simplifying things there but that's the crux of it.)

Right. A living community needs developers that write new code. And the
repository structure is one important thing. But in my opinion it is not
the most important one. With my 3-4 years experience in the kernel
community I made the experience that the maintainers are the most
important factor. I find a maintainer not commiting or caring about
patches or not releasing new versions much worse than the wrong
repository structure.
oProfile has this problem with its userspace part. I partly made this
bad experience with x86-64 before the architecture merge. KVM does not
have this problem.

> So yes, i do claim that what stiffled and eventually killed off the Oprofile 
> community was the split repository. None of the other Oprofile shortcomings 
> were really unfixable, but this one was. It gave no way for the community to 
> grow in a healthy way, after the initial phase. Features were more difficult 
> and less fun to develop.

The biggest problem oProfile has is that it does not support per-process
measuring. This is indeed not unfixable but it also doesn't fit well in
the overall oProfile concept.

> I simply do not want to see KVM face the same fate, and yes i do see similar 
> warnings signs.

In fact, the development process in KVM has improved over time. In the
early beginnings everything was kept in svn. Avi switched to git some
day but at the time when we had these kvm-XX releases both kernel- and
user-space together were unbisectable. This has improved to a point
where the kernel-part could be bisected. The KVM maintainers and
community have shown in the past that they can address problems with the
development process if they come up.

> Oprofile certainly had good developers and maintainers as well. In the end it 
> wasnt enough ...
> 
> Also, a project can easily still be 'alive' but not reach its full potential. 
> 
> Why do you assume that my argument means that KVM isnt viable today? It can 
> very well still be viable and even healthy - just not _as healthy_ as it 
> could 
> be ...

I am not aware that I made you say anything ;-)

> 
> > > The difference is that we dont have KVM with a decade of history and we 
> > > dont have a 'told you so' KVM reimplementation to show that proves the 
> > > point. I guess it's a matter of time before that happens, because Qemu 
> > > usability is so absymal today - so i guess we should suspend any 
> > > discussions until that happens, no need to waste time on arguing 
> > > hypoteticals.
> > 
> > We actually have lguest which is small. But it lacks functionality and the 
> > developer community KVM has attracted.
> 
> I suggested long ago to merge lguest into KVM to cover non-VMX/non-SVM 
> execution.

That would have been the best. Rusty already started this work and
presented it at the first KVM Forum. But I have never seen patches ...

> > > I think you are rationalizing the status quo.
> > 
> > I see that there are issues with KVM today in some areas. You pointed out 
> > the desktop usability already. I personally have trouble with the 
> > qem-kvm.git because it is unbisectable. But repository unification doesn't 
> > solve the problem here.
> 
> Why doesnt it solve the bisectability problem? The kernel repo is supposed to 
> be bisectable so that problem would be solved.

Because Marcelo and Avi try to keep as close to upstream qemu as
possible. So the qemu repo is regularly merged in qemu-kvm and if you
want to bisect you may end up somewhere in the middle of the qemu
repository which has only very minimal kvm-support.
The problem here is that two qemu repositorys exist. But the current
effort of Anthony is directed to create a single qemu repository. But
thats not done overnight.
Merging qemu into the kernel would make Linus in fact a qemu maintainer.
I am not su

Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Ingo Molnar

* Avi Kivity  wrote:

> > Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by 
> > Anthony. There's numerous ways that this can break:
> 
> I don't like it either.  We have libvirt for enumerating guests.

Which has pretty much the same problems to the ${HOME}/.qemu/qmp/ solution, 
obviously.

> >  - Those special files can get corrupted, mis-setup, get out of sync, or can
> >be hard to discover.
> >
> >  - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
> >design flaw: it is per user. When i'm root i'd like to query _all_ 
> > current
> >guest images, not just the ones started by root. A system might not even
> >have a notion of '${HOME}'.
> >
> >  - Apps might start KVM vcpu instances without adhering to the
> >${HOME}/.qemu/qmp/ access method.
> 
> - it doesn't work with nfs.

So out of a list of 4 disadvantages your reply is that you agree with 3?

> >  - There is no guarantee for the Qemu process to reply to a request - while
> >the kernel can always guarantee an enumeration result. I dont want 'perf
> >kvm' to hang or misbehave just because Qemu has hung.
> 
> If qemu doesn't reply, your guest is dead anyway.

Erm, but i'm talking about a dead tool here. There's a world of a difference 
between 'kvm top' not showing new entries (because the guest is dead), and 
'perf kvm top' hanging due to Qemu hanging.

So it's essentially 4 our of 4. Yet your reply isnt "Ingo you are right" but 
"hey, too bad" ?

> > Really, for such reasons user-space is pretty poor at doing system-wide 
> > enumeration and resource management. Microkernels lost for a reason.
> 
> Take a look at your desktop, userspace is doing all of that everywhere, from 
> enumerating users and groups, to deciding how your disks are named.  The 
> kernel only provides the bare facilities.

We dont do that for robust system instrumentation, for heaven's sake!

By your argument it would be perfectly fine to implement /proc purely via 
user-space, correct?

> > You are committing several grave design mistakes here.
> 
> I am committing on the shoulders of giants.

Really, this is getting outright ridiculous. You agree with me that Anothony 
suggested a technically inferior solution, yet you even seem to be proud of it 
and are joking about it?

And _you_ are complaining about lkml-style hard-talk discussions?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Anthony Liguori

On 03/22/2010 02:10 PM, Ingo Molnar wrote:


I posit that it's both: and that priorities can be communicated - if only you
try as a maintainer. All i'm suggesting is to add 'usable, unified user-space'
to the list of unfun priorities, because it's possible and because it matters.
   


I've spent the past few months dealing with customers using the 
libvirt/qemu/kvm stack.  Usability is a major problem and is a top 
priority for me.  That is definitely a shift but that occurred before 
you started your thread.


But I disagree with your analysis of what the root of the problem is.  
It's a very kernel centric view and doesn't consider the interactions 
between userspace.


Regards,

Anthony Liguori


Ingo
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Anthony Liguori

On 03/22/2010 12:55 PM, Avi Kivity wrote:
Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by 
Anthony.

There's numerous ways that this can break:


I don't like it either.  We have libvirt for enumerating guests.


We're stuck in a rut with libvirt and I think a lot of the 
dissatisfaction with qemu is rooted in that.  It's not libvirt that's 
the probably, but the relationship between qemu and libvirt.


We add a feature to qemu and maybe after six month it gets exposed by 
libvirt.  Release time lines of the two projects complicate the 
situation further.  People that write GUIs are limited by libvirt 
because that's what they're told to use and when they need something 
simple, they're presented with first getting that feature implemented in 
qemu, then plumbed through libvirt.


It wouldn't be so bad if libvirt was basically a passthrough interface 
to qemu but it tries to model everything in a generic way which is more 
or less doomed to fail when you're adding lots of new features (as we are).


The list of things that libvirt doesn't support and won't any time soon 
is staggering.


libvirt serves an important purpose, but we need to do a better job in 
qemu with respect to usability.  We can't just punt to libvirt.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Ingo Molnar

* Avi Kivity  wrote:

> On 03/22/2010 06:32 PM, Ingo Molnar wrote:
> >
> > So, what do you think creates code communities and keeps them alive? 
> > Developers and code. And the wellbeing of developers are primarily 
> > influenced by the repository structure and by the development/maintenance 
> > process - i.e. by the 'fun' aspect. (i'm simplifying things there but 
> > that's the crux of it.)
> 
> There is nothing fun about having one repository or two.  Who cares about 
> this anyway?
> 
> tools/kvm/ probably will draw developers, simply because of the glory 
> associated with kernel work.  That's a bug, not a feature.  It means that 
> effort is not distributed according to how it's needed, but because of 
> irrelevant considerations.

And yet your solution to that is to ... do all your work in the kernel space 
and declare the tooling as something that does not interest you? ;-)

> Something I've wanted for a long time is to port kvm_stat to use tracepoints 
> instead of the home-grown instrumentation.  But that is unrelated to this 
> new tracepoint.  Other than that we're satisfied with ftrace.

Despite it being another in-kernel subsystem that by your earlier arguments 
should be done via a user-space package? ;-)

> > You should realize that naturally developers will gravitate towards the 
> > most 'fun' aspects of a project. It is the task of the maintainer to keep 
> > the balance between fun and utility, bugs and features, quality and 
> > code-rot.
> 
> There are plenty of un-fun tasks (like fixing bugs and providing RAS 
> features) that we're doing.  We don't do this for fun but to satisfy our 
> users.

So which one is it, KVM developers are volunteers that do fun stuff and cannot 
be told about project priorities, or KVM developers are pros who do unfun 
stuff because they can be told about priorities?

I posit that it's both: and that priorities can be communicated - if only you 
try as a maintainer. All i'm suggesting is to add 'usable, unified user-space' 
to the list of unfun priorities, because it's possible and because it matters.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Avi Kivity

On 03/22/2010 04:54 PM, Ingo Molnar wrote:

* Pekka Enberg  wrote:

   

Hi Avi,

On Mon, Mar 22, 2010 at 2:49 PM, Avi Kivity  wrote:
 

Seems like perf is also split, with sysprof being developed outside the
kernel. ?Will you bring sysprof into the kernel? ?Will every feature be
duplicated in prof and sysprof?
   

I am glad you brought it up! Sysprof was historically outside of the kernel
(with it's own kernel module, actually). While the GUI was nice, it was much
harder to set up compared to oprofile so it wasn't all that popular. Things
improved slightly when Ingo merged the custom kernel module but the
_userspace_ part of sysprof was lagging behind a bit. I don't know what's
the situation now that they've switched over to perf syscalls but you
probably get my point.

It would be nice if the two projects merged but I honestly don't see any
fundamental problem with two (or more) co-existing projects. Friendly
competition will ultimately benefit the users (think KDE and Gnome here).
 

See my previous mail - what i see as the most healthy project model is to have
a full solution reference implementation, connected to a flexible halo of
plugins or sub-apps.

Firefox does that, KDE does that, and Gnome as well to a certain degree.

The 'halo' provides a constant feedback of new features, and it also provides
competition and pressure on the 'main' code to be top-notch.

The problem i see with KVM is that there's no reference implementation! There
is _only_ the KVM kernel part which is not functional in itself. Surrounded by
a 'halo' - where none of the entities is really 'the' reference implementation
we call 'KVM'.
   


The reference implementation is qemu-kvm.git, in the future qemu.git.  
Like the reference implementation of device-mapper is 
lvm2/device-mapper, not tools/device-mapper.



This causes constant quality problems as the developers of the main project
dont have constant pressure towards good quality (it is not their
responsibility to care about user-space bits after all),


The developers of the main project are very much aware that users don't 
call the ioctls directly but instead use qemu.



  plus it causes a lack
of focus as well: integration between (friendly) competing user-space
components is a lot harder than integration within a single framework such as
Firefox.
   


We are very focused, just not on what you think we should be focused.


I hope this explains my points about modularization a bit better! I suggested
KVM to grow a user-space tool component in the kernel repo in tools/kvm/,
which would become the reference implementation for tooling. User-space
projects can still provide alternative tooling or can plug into this tooling,
just like they are doing it now. So the main effect isnt even on those
projects but on the kernel developers. The ABI remains and all the user-space
packages and projects remain.
   


Seems like wanton duplication of effort.  Can we throw so many 
developer-years away on duplicate projects?  Assuming not all are true 
volunteers (85% for 2.6.33) who will fund this duplicate effort?



Yes, i thought Qemu would be a prime candidate to be the baseline for
tools/kvm/, but i guess that has become socially impossible now after this
flamewar. It's not a big problem in the big scheme of things: tools/kvm/ is
best grown up from a small towards larger size anyway ...
   


Qemu is open source, you can cp it into tools/kvm.  Rewriting it from 
scratch is a mammoth effort, there's a reason kvm, Xen, and virtualbox 
all use qemu.  Qemu itself copied code from bochs.  Writing this stuff 
is hard, especially if there is something already working.


You'll probably get much better threading (the qemu device model is 
still single threaded), but it will take years to reach where qemu is 
already at.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Avi Kivity

On 03/22/2010 08:10 PM, Pekka Enberg wrote:

On Mon, Mar 22, 2010 at 8:04 PM, Avi Kivity  wrote:
   

That said, pulling 400 KLOC of code into the kernel sounds really
excessive. Would we need all that if we just do native virtualization
and no actual emulation?
   

What is native virtualization and no actual emulation?
 

What I meant with "actual emulation" was running architecture A code
on architecture B what was qemu's traditional use case. So the
question was how much of the 400 KLOC do we need for just KVM on all
the architectures that it supports?
   


qemu is 620 KLOC.  Without cpu emulation that drops to ~480 KLOC.  Much 
of that is device emulation that is not supported by kvm now (like ARM) 
but some might be needed again in the future (like ARM).


x86-only is perhaps 300 KLOC, but kvm is not x86 only.

And that is with a rudimentary GUI.  GUIs are heavy.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Anthony Liguori

On 03/22/2010 12:34 PM, Ingo Molnar wrote:

This is really just the much-discredited microkernel approach for keeping
global enumeration data that should be kept by the kernel ...

Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by Anthony.
There's numerous ways that this can break:

  - Those special files can get corrupted, mis-setup, get out of sync, or can
be hard to discover.

  - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
design flaw: it is per user. When i'm root i'd like to query _all_ current
guest images, not just the ones started by root. A system might not even
have a notion of '${HOME}'.

  - Apps might start KVM vcpu instances without adhering to the
${HOME}/.qemu/qmp/ access method.

  - There is no guarantee for the Qemu process to reply to a request - while
the kernel can always guarantee an enumeration result. I dont want 'perf
kvm' to hang or misbehave just because Qemu has hung.
   


If your position basically boils down to, we can't trust userspace and 
we can always trust the kernel, I want to eliminate any userspace path, 
then I can't really help you out.


I believe we can come up with an infrastructure that satisfies your 
actual requirements within qemu but if you're also insisting upon the 
above implementation detail then there's nothing I can do.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Anthony Liguori

On 03/22/2010 12:34 PM, Ingo Molnar wrote:

* Avi Kivity  wrote:

   

  - Easy default reference to guest instances, and a way for tools to
reference them symbolically as well in the multi-guest case. Preferably
something trustable and kernel-provided - not some indirect information
like a PID file created by libvirt-manager or so.
   

Usually 'layering violation' is trotted out at such suggestions.
[...]
 

That's weird, how can a feature request be a 'layering violation'?
   

The 'something trustable and kernel-provided'.  The kernel knows nothing
about guest names.
 

The kernel certainly knows about other resources such as task names or network
interface names or tracepoint names. This is kernel design 101.

   

If something that users find straightforward and usable is a layering
violation to you (such as easily being able to access their own files on
the host as well ...) then i think you need to revisit the definition of
that term instead of trying to fix the user.
   

Here is the explanation, you left it quoted:

 

[...]  I don't like using the term, because sometimes the layers are
incorrect and need to be violated.  But it should be done explicitly, not
as a shortcut for a minor feature (and profiling is a minor feature, most
users will never use it, especially guest-from-host).

The fact is we have well defined layers today, kvm virtualizes the cpu
and memory, qemu emulates devices for a single guest, libvirt manages
guests. We break this sometimes but there has to be a good reason.  So
perf needs to talk to libvirt if it wants names.  Could be done via
linking, or can be done using a pluging libvirt drops into perf.
 

This is really just the much-discredited microkernel approach for keeping
global enumeration data that should be kept by the kernel ...

Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by Anthony.
There's numerous ways that this can break:

  - Those special files can get corrupted, mis-setup, get out of sync, or can
be hard to discover.

  - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
design flaw: it is per user. When i'm root i'd like to query _all_ current
guest images, not just the ones started by root. A system might not even
have a notion of '${HOME}'.

  - Apps might start KVM vcpu instances without adhering to the
${HOME}/.qemu/qmp/ access method.
   


Not all KVM vcpus are running operating systems.

Transitive had a product that was using a KVM context to run their 
binary translator which allowed them full access to the host processes 
virtual address space range.  In this case, there is no kernel and there 
are no devices.


That's what I mean by a guest being a userspace context.  KVM simply 
provides a new CPU mode to userspace in the same way that vm8086 mode.


Regards,

Anthony Liguori


  - There is no guarantee for the Qemu process to reply to a request - while
the kernel can always guarantee an enumeration result. I dont want 'perf
kvm' to hang or misbehave just because Qemu has hung.

Really, for such reasons user-space is pretty poor at doing system-wide
enumeration and resource management. Microkernels lost for a reason.

You are committing several grave design mistakes here.

Thanks,

Ingo
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Anthony Liguori

On 03/22/2010 12:11 PM, Ingo Molnar wrote:

* Anthony Liguori  wrote:

   

  - Easy default reference to guest instances, and a way for tools to
reference them symbolically as well in the multi-guest case. Preferably
something trustable and kernel-provided - not some indirect information
like a PID file created by libvirt-manager or so.
   

A guest is not a KVM concept. [...]
 

Well, in a sense a guest is a KVM concept too: it's in essence represented via
the 'vcpu state attached to a struct mm' abstraction that is attached to the
/dev/kvm file descriptor attached to a Linux process.

Multiple vcpus can be started by the same process to represent SMP, but the
whole guest notion is present: a Linux MM that carries KVM state.

In that sense when we type 'perf kvm list' we'd like to get a list of all
currently present guests that the developer has permission to profile: i.e.
we'd like a list of all [debuggable] Linux tasks that have a KVM instance
attached to them.

A convenient way to do that would be to use the Qemu process's ->comm[] name,
and to have a KVM ioctl that gets us a list of all vcpus that the querying
task has ptrace permission to. [the standard permission check we do for
instrumentation]

No need for communication with Qemu for that - just an ioctl, and an
always-guaranteed result that works fine on a whole-system and on a per user
basis as well.
   


You need a way to interact with the guest which means you need some type 
of device.  All of the interesting devices are implemented in qemu so 
you're going to have to interact with qemu if you want meaningful 
interaction with a guest.


Regards,

Anthony Liguori


Thanks,

Ingo
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Anthony Liguori

On 03/22/2010 11:59 AM, Ingo Molnar wrote:


Ok, that sounds interesting! I'd rather see some raw mechanism that 'perf kvm'
could use instead of having to require yet another library (which generally
dampens adoption of a tool). So i think we can work from there.
   


You can access the protocol directly if you don't want a library dependency.


Btw., have you considered using Qemu's command name (task->comm[]) as the
symbolic name? That way we could see the guest name in 'top' on the host - a
nice touch.
   


qemu-system-x86_64 -name Fedora,process=qemu-Fedora

Does exactly that.  We don't make this default based on the element of 
least surprise.  Many users expect to be able to do killall 
qemu-system-x86 and if we did this by default, that wouldn't work.



The sockets are named based on UUID and you'll have to connect to a guest
and ask it for it's name.  Some guests don't have names so we'll have to
come up with a clever way to describe a nameless VM.
 

I think just exposing the UUID in that lazy case would be adequate? It creates
pressure for VM launchers to use better symbolic names.
   


Yup.


I.e.:

  - Easy default reference to guest instances, and a way for tools to
reference them symbolically as well in the multi-guest case. Preferably
something trustable and kernel-provided - not some indirect information
like a PID file created by libvirt-manager or so.
   

A guest is not a KVM concept.  It's a qemu concept so it needs to be
something provided by qemu.  The other caveat is that you won't see guests
created by libvirt because we're implementing this in terms of a default QMP
device and libvirt will disable defaults.  This is desired behaviour.
libvirt wants to be in complete control and doesn't want a tool like perf
interacting with a guest directly.
 

Hm, this sucks for multiple reasons. Firstly, perf isnt a tool that
'interacts', it's an observation tool: just like 'top' is an observation tool.

We want to enable developers to see all activities on the system - regardless
of who started the VM or who started the process. Imagine if we had a way to
hide tasks to hide from 'top'. It would be rather awful.

Secondly, it tells us that the concept is fragile if it doesnt automatically
enumerate all guests, regardless of how they were created.
   


Perf does interact with a guest though because it queries a guest to 
read it's file system.


I understand the point you're making though.  If instead of doing a pull 
interface where the host queries the guest for files, if the guest 
pushed a small set of files at startup which the host cached, then you 
could potentially unconditionally expose a "read-only" socket that only 
exposed limited information.



Full system enumeration is generally best left to the kernel, as it can offer
coherent access.
   


I don't see why qemu can't offer coherent access.  The limitation today 
is intentional and if it's overly restrictive, we can figure out a means 
to change it.


Regards,

Anthony Liguori



Ingo
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Unify KVM kernel-space and user-space code into a single project

2010-03-22 Thread Avi Kivity

On 03/22/2010 04:47 PM, Ingo Molnar wrote:



If you are interested in the first-hand experience of the people who are
doing the perf work then here it is: by far the biggest reason for perf
success and perf usability is the integration of the user-space tooling
with the kernel-space bits, into a single repository and project.
   

Please take a look at the kvm integration code in qemu as a fraction of the
whole code base.
 

You have to admit that much of Qemu's past 2-3 years of development was
motivated by Linux/KVM (i'd say more than 50% of the code).


kvm certainly revitalized qemu development.


As such it's one
and the same code base - you just continue to define Qemu to be different from
KVM.
   


It's not the same code base.  kvm provides a cpu virtualization service, 
qemu uses it.  There could be other users.  qemu could go away one day 
and be replaced by something else (tools/kvm?), and kvm would be unaffected.



I very much remember how Qemu looked like _before_ KVM: it was a struggling,
dying project. KVM clearly changed that.
   


I'm a hero.


The very move you are opposing so vehemently for KVM.
   

I don't want to fracture a working community.
 

Would you accept (or at least not NAK) a new tools/kvm/ tool that builds
tooling from grounds up, while leaving Qemu untouched? [assuming it's all
clean code, etc.]
   


I couldn't NAK tools/kvm any more than I could NAK a new project outside 
the kernel repository.  IMO it would be duplicated effort, but like I 
mentioned before, I can't tell volunteers what to do, only recommend 
that they join the existing effort.



Although i have doubts about how well that would work 'against' your opinion:
such a tool would need lots of KVM-side features and a positive attitude from
you to be really useful. There's a lot of missing functionality to cover.
   


Functionality that can be implemented in userspace will not be accepted 
into kvm unless there are very good reasons why it should be.  Things 
that belong in kvm will be more than welcome.



Seems like perf is also split, with sysprof being developed outside the
kernel.  Will you bring sysprof into the kernel?  Will every feature be
duplicated in prof and sysprof?
 

I'd prefer if sysprof merged into perf as 'perf view' - but its maintainer
does not want that - which is perfectly OK.


You spared him the flamewar, I hope.


So we are building equivalent
functionality into perf instead.
   


Ah, duplicating effort.  Great.


Think about it like Firefox plugins: the main Firefox project picks up the
functionality of the most popular Firefox plugins all the time. Session Saver,
Tab Mix Plus, etc. were all in essence 'merged' (in functionality, not in
code) into the 'reference' Firefox project.
   


There's a difference between absorbing a small plugin and duplicating a 
project.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >