date:20100324

Re: [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Gerd Hoffmann


On 03/24/10 00:13, Jamie Lokier wrote:

Gerd Hoffmann wrote:

- networking: man, setting networking is a mess, libvirt just does it
   for you.


+1

Even when not using libvirt for a reason or another I usually hook my
virtual machines into virbr0 (libvirt default network).


I had the opposite problem.  Needed to use multiple bridges and have
some VMs behind NAT without a bridge (private IPs), and some using
separately firewalled bridges (needed to behave like real attached
hardware with their original MACs, but be firewalled).


No problem in theory.  libvirt should detect existing bridges and allow 
you to attach virtual machines to them.  So you can setup bridges and 
firewalling for them using usual distro tools and use them for virtual 
machines.


In practice I've seen this not working correctly in the past, i.e. my 
br0 didn't pop up in the virt-manager nic setup page.


cheers,
  Gerd

[Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Juan Quintela

Andi Kleen a...@firstfloor.org wrote:
 Juan Quintela quint...@redhat.com writes:

 - networking: man, setting networking is a mess, libvirt just does it
   for you.

 Agreed it's messy, but isn't this something that the standard qemu
 command line tool could potentially do better by itself? I don't see why you 
 need a wrapper for that.

In my case, basically it is MAC addresses.  I have dhcp setup, and it
always give the same IP to the same MAC.  But you have to remember to
type the MAC addresses.

This is the typical command line that virsh start launch for me:

/usr/libexec/qemu-kvm -S -M pc-0.12 -enable-kvm -m 1024 -smp
2,sockets=2,cores=1,threads=1 -name f12X-64 -uuid
1fbe73a6-f519-e848-03bd-6636f765d143 -nodefaults -chardev
socket,id=monitor,path=/var/lib/libvirt/qemu/f12X-64.monitor,server,nowait
-mon chardev=monitor,mode=readline -rtc base=utc -boot c -drive
file=/mnt/kvm/images/f12X-64.img,if=none,id=drive-virtio-disk0,boot=on,cache=none
-device
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
-device
virtio-net-pci,vlan=0,id=net0,mac=54:52:00:44:72:e6,bus=pci.0,addr=0x5
-net tap,fd=18,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device
isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc
127.0.0.1:0 -k es -vga cirrus -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3

From parts:

/usr/libexec/qemu-kvm
 -S

I don't want that
 -M pc-0.12

I don't care.

-enable-kvm

I _want_ :)
-m 1024

Also god idea

-smp 2,sockets=2,cores=1,threads=1

by hand it is always -smp 2
-name f12X-64

-uuid 1fbe73a6-f519-e848-03bd-6636f765d143

don't care

 -nodefaults

-chardev
socket,id=monitor,path=/var/lib/libvirt/qemu/f12X-64.monitor,server,nowait
-mon chardev=monitor,mode=readline

this is simplified as:
  -monitor stdio
when I launch it by hand.

 -rtc base=utc -boot c

don't care

-drive 
file=/mnt/kvm/images/f12X-64.img,if=none,id=drive-virtio-disk0,boot=on,cache=none
-device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0

this is _wow_, I only want to put the disk image path and convince it to
use virtio driver

-device virtio-net-pci,vlan=0,id=net0,mac=54:52:00:44:72:e6,bus=pci.0,addr=0x5
-net tap,fd=18,vlan=0,name=hostnet0

this always have to be changed. s/fd=18/script=/etc/kvm-ifup/
and then I normally found that I want downscript= to avoid the warning
at exit time.  If I don't put a mac address, qemu command line works
well, but as I normally also use vnc I have to:
- launch qemu
- kill it, relaunch with -vnc :0 instead of -vnc 127.0.0.1:0
- re-launch qemu
- connect to vnc
- check what address the dhcp server was giving to it this time
- I can ssh to the client now
with libvirt handling the command line, I just ssh to the same dhcp
address that was given the previous time/day/...

-chardev pty,id=serial0 -device isa-serial,chardev=serial0

I only use serial from time to time, and using -serial
tcp:0,server,nowait (or whatever is the sintax is easier by hand)

-usb -device usb-tablet,id=input0

usb tablet is mandatory, just in case the guest is able to _not_ grab
the mouse.

-vnc 127.0.0.1:0

Allways wrong in my case, because I want to run the vnc client in a
different machine.  a way to convince virt-viewer to connect to a qemu
launched by hand, or a way to convince libvirt to let me edit the
command line will be great.

 -k es -vga cirrus

this get right by default.

-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3

I normally don't use balloon.

Notice for the normally I don't care bits, that at the end I always
care.  Why?  because then somebody arrives and told me that sound don't
work, and I have to edit the config file, and add sound option.  add a
sound option to the command line of qemu is not too complicate.

The other big problem for me are snapshots,  I have to remember
_exactly_ what was the qemu command line with which I saved the
snapshot.  Guess what, I normally don't remember and end:

- launching old qemu
- save a new snapshot
- test with the new qemu and new snapshot (because now I have the
  command line that I launched 5 mins before).

Just in case it helps.

Later, Juan.

[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Juan Quintela

Blue Swirl blauwir...@gmail.com wrote:
 Hi,

 Here's some planning for getting most files compiled as few times as
 possible. Comments and suggestions are welcome.

I took some thought about this at some point.  Problems here start from
Recursive Makefile condered Harmful (tm).

Look at how we jump through hops to be able to compile things in
one/other side.

We have:
Makefile
Makefile.target (really lots of them, one for target)
Makefile.hw
Makefile.user

If we had only a single Makefile, things in this department would be
much, much easier. And no, convert to a single Makefile is not trivial
either, but it would make things easier.

Why do we have several Makefiles?  Because we want to compile each file
with different options.

Why do we need to abuse so much VPATH?  Because we need to bring files
randomly from $(ROOT), $(ROOT)/hw  $(ROOT)/$(TARGET).

Problem here, there isn't a simple way to compile files for several
target just once (no way to put them).

Our main copmile rule is:

$(QEMU_PROG): $(obj-y) $(obj-$(TARGET_BASE_ARCH)-y)
$(call LINK,$(obj-y) $(obj-$(TARGET_BASE_ARCH)-y))


(notice that things compiled in Makefile are trivial, they are already
compiled just once by definition, problems are for all the qemu's we
compile).

We could change: $(obj-$(TARGET_BASE_ARCH)-y) to something like:

OBJ-TARGET=s/.o/.$(TARGET_BASE_ARCH).o/

(I forgot the subst Makefile syntax), and have the:

%.$(TARGET_BASE_ARCH).o: %.c
   gcc $(TARGET_BASE_ARCH options)

From there, as you suggested, we need some files that are not compiled
by architecture, they need to be compiled by board, well, we need to add
yet another level obj-$(TARGET_BOARD) or whatever.

Notice that this is a lot of work, but you are needing the audit to be
able to compile only once.  Problem just now is that there is not a
simple way to describe that information,  with my proposal it gets
trivial to express:

obj-$(CONFIG_FOO) += foo.o  # You need this for everything
obj-mips-$(CONFIG_FOO) += foo.o  # You need this for all mips boards
obj-malta-$(CONFIG_FOO) += foo.o  # You need this for all mips malta board

You still need to do some different magic from hw-32/64 but it could be
done this way.  Once you did it this way, you now where the files are
(hw or target) and you can drop the VPATH tricks.

Problem with this proposal is that it is not trivial to do in little
steps, and the real big advantages appear when you switch to a single
Makefile at the end.

 vl.c: a lot of work. Maybe the CPUState stuff should be separated to a new 
 file.

That should just be a rule in Documentation.  You can't but anything
else in vl.c.  If you move anything out of vl.c (see timers work from
bonzini for example), you get a wild card for free commit bypassing
maintainers or some similar price :)

rest of files

I haven't really looked at them at depth.

I looked when I cleaned up the build system, I thought how to do the
next step (outlined before), but got sidetracked by other more urgent
things.

Later, Juan.

[Qemu-devel] Re: Exposing monitor on socket interface?

2010-03-24 Thread Juan Quintela

Jun Koi junkoi2...@gmail.com wrote:
 Hi,

 Is it possible to use -monitor option to expose the monitor on socket
 interface, such as TCP or Unix domain port, so I can access the
 monitor using non-stdio way?

man qemu

search -monitor

   -monitor dev
   Redirect the monitor to host device dev (same devices as the serial
   port).  The default device is vc in graphical mode and stdio in
   non graphical mode.

search -serial

  -serial dev
   Redirect the virtual serial port to host character device dev. The
   default device is vc in graphical mode and stdio in non
   graphical mode.

   This option can be used several times to simulate up to 4 serial
   ports.

   Use -serial none to disable all serial ports.

   Available character devices are:


  tcp:[host]:port[,server][,nowait][,nodelay]
   The TCP Net Console has two modes of operation.  It can send
   the serial I/O to a location or wait for a connection from a
   location.  By default the TCP Net Console is sent to host at
   the port.  If you use the server option QEMU will wait for a
   client socket application to connect to the port before
   continuing, unless the nowait option was specified.  The
   nodelay option disables the Nagle buffering algorithm.  If
   host is omitted, 0.0.0.0 is assumed. Only one TCP connection at
   a time is accepted. You can use telnet to connect to the
   corresponding character device.

   Example to send tcp console to 192.168.0.2 port 
   -serial tcp:192.168.0.2:

   Example to listen and wait on port  for connection
   -serial tcp::,server

   Example to not wait and listen on ip 192.168.0.100 port 
   -serial tcp:192.168.0.100:,server,nowait

   telnet:host:port[,server][,nowait][,nodelay]
   The telnet protocol is used instead of raw tcp sockets.  The
   options work the same as if you had specified -serial tcp.
   The difference is that the port acts like a telnet server or
   client using telnet option negotiation.  This will also allow
   you to send the MAGIC_SYSRQ sequence if you use a telnet that
   supports sending the break sequence.  Typically in unix telnet
   you do it with Control-] and then type send break followed by
   pressing the enter key.


I think that it is difficult to get more options that qemu in that
department :-)

Later, Juan.

[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Paolo Bonzini




The harder cases are those where the device code depends somehow on
the architecture. Some thoughts follow.

vl.c: a lot of work. Maybe the CPUState stuff should be separated to a new file.

dma.c: DMA_schedule needs access to CPUState.


Most users of CPUState (e.g. qemu-timer.c and hw/dma.c) either need it 
as an opaque pointer, or only need access to target-independent stuff. 
So you could:


1) make CPUState define only common fields.  Include CPUState at the 
beginning of each per-target CPUXYZState.


2) Do s/CPUState/CPUXYZState/ on target-*/*.

3) Make it compile, possibly by undoing parts of 2) and changing parts 
of it to DO_UPCAST.


Paolo

[Qemu-devel] Re: [RFC] vhost-blk implementation

2010-03-24 Thread Michael S. Tsirkin

On Tue, Mar 23, 2010 at 12:55:07PM -0700, Badari Pulavarty wrote:
 Michael S. Tsirkin wrote:
 On Tue, Mar 23, 2010 at 10:57:33AM -0700, Badari Pulavarty wrote:
   
 Michael S. Tsirkin wrote:
 
 On Mon, Mar 22, 2010 at 05:34:04PM -0700, Badari Pulavarty wrote:
 
 Write Results:
 ==

 I see degraded IO performance when doing sequential IO write
 tests with vhost-blk compared to virtio-blk.

 # time dd of=/dev/vda if=/dev/zero bs=2M oflag=direct

 I get ~110MB/sec with virtio-blk, but I get only ~60MB/sec with
 vhost-blk. Wondering why ?
 
 Try to look and number of interrupts and/or number of exits.
 
 I checked interrupts and IO exits - there is no major noticeable   
 difference between
 vhost-blk and virtio-blk scenerios.
 
 It could also be that you are overrunning some queue.

 I don't see any exit mitigation strategy in your patch:
 when there are already lots of requests in a queue, it's usually
 a good idea to disable notifications and poll the
 queue as requests complete. That could help performance.
 
 Do you mean poll eventfd for new requests instead of waiting for new  
 notifications ?
 Where do you do that in vhost-net code ?
 

 vhost_disable_notify does this.

   
 Unlike network socket, since we are dealing with a file, there is no  
 -poll support for it.
 So I can't poll for the data. And also, Issue I am having is on the   
 write() side.
 

 Not sure I understand.

   
 I looked at it some more - I see 512K write requests on the
 virtio-queue  in both vhost-blk and virtio-blk cases. Both qemu or
 vhost is doing synchronous  writes to page cache (there is no write
 batching in qemu that is affecting this  case).  I still puzzled on
 why virtio-blk outperforms vhost-blk.

 Thanks,
 Badari
 

 If you say the number of requests is the same, we are left with:
 - requests are smaller for some reason?
 - something is causing retries?
   
 No. IO requests sizes are exactly same (512K) in both cases. There are  
 no retries or
 errors in both cases. One thing I am not clear is - for some reason  
 guest kernel
 could push more data into virtio-ring in case of virtio-blk vs  
 vhost-blk. Is this possible ?
 Does guest gets to run much sooner in virtio-blk case than vhost-blk ?  
 Sorry, if its dumb question -
 I don't understand  all the vhost details :(

 Thanks,
 Badari


You said you observed same number of requests in userspace versus kernel above.
And request size is the same as well. But somehow more data is
transferred? I'm confused.

-- 
MST

[Qemu-devel] Re: Exposing monitor on socket interface?

2010-03-24 Thread Jun Koi

Thanks a lot, Juan!

Jun

On Wed, Mar 24, 2010 at 6:41 PM, Juan Quintela quint...@redhat.com wrote:
 Jun Koi junkoi2...@gmail.com wrote:
 Hi,

 Is it possible to use -monitor option to expose the monitor on socket
 interface, such as TCP or Unix domain port, so I can access the
 monitor using non-stdio way?

 man qemu

 search -monitor

       -monitor dev
           Redirect the monitor to host device dev (same devices as the serial
           port).  The default device is vc in graphical mode and stdio in
           non graphical mode.

 search -serial

      -serial dev
           Redirect the virtual serial port to host character device dev. The
           default device is vc in graphical mode and stdio in non
           graphical mode.

           This option can be used several times to simulate up to 4 serial
           ports.

           Use -serial none to disable all serial ports.

           Available character devices are:

 
          tcp:[host]:port[,server][,nowait][,nodelay]
               The TCP Net Console has two modes of operation.  It can send
               the serial I/O to a location or wait for a connection from a
               location.  By default the TCP Net Console is sent to host at
               the port.  If you use the server option QEMU will wait for a
               client socket application to connect to the port before
               continuing, unless the nowait option was specified.  The
               nodelay option disables the Nagle buffering algorithm.  If
               host is omitted, 0.0.0.0 is assumed. Only one TCP connection at
               a time is accepted. You can use telnet to connect to the
               corresponding character device.

               Example to send tcp console to 192.168.0.2 port 
                   -serial tcp:192.168.0.2:

               Example to listen and wait on port  for connection
                   -serial tcp::,server

               Example to not wait and listen on ip 192.168.0.100 port 
                   -serial tcp:192.168.0.100:,server,nowait

           telnet:host:port[,server][,nowait][,nodelay]
               The telnet protocol is used instead of raw tcp sockets.  The
               options work the same as if you had specified -serial tcp.
               The difference is that the port acts like a telnet server or
               client using telnet option negotiation.  This will also allow
               you to send the MAGIC_SYSRQ sequence if you use a telnet that
               supports sending the break sequence.  Typically in unix telnet
               you do it with Control-] and then type send break followed by
               pressing the enter key.


 I think that it is difficult to get more options that qemu in that
 department :-)

 Later, Juan.

[Qemu-devel] Re: Completing big real mode emulation

2010-03-24 Thread Sheng Yang

On Saturday 20 March 2010 23:00:49 Alexander Graf wrote:
 Am 20.03.2010 um 15:02 schrieb Mohammed Gamal m.gamal...@gmail.com:
  On Sat, Mar 20, 2010 at 3:18 PM, Avi Kivity a...@redhat.com wrote:
  On 03/20/2010 10:55 AM, Alexander Graf wrote:
  I'd say that a GSoC project would rather focus on making a guest
  OS work
  than working on generic big real mode. Having Windows 98 support
  is way more
  visible to the users. And hopefully more fun to implement too,
  as it's a
  visible goal :-).
 
  Big real mode allows you to boot various OSes, such as that old
  Ubuntu/SuSE boot loader which triggered the whole thing.
 
  I thought legacy Windows uses it too?
 
  IIRC even current Windows (last I checked was XP, but it's probably
  true for
  newer) invokes big real mode inadvertently.  All it takes is not to
  clear fs
  and gs while switching to real mode.  It works because the real
  mode code
  never uses gs and fs (i.e. while we are technically in big real
  mode, the
  guest never relies on this), and because there are enough hacks in
  vmx.c to
  make it work (restoring fs and gs after the switch back).  IIRC
  there are
  other cases of invalid guest state that we hack into place during
  mode
  switches.
 
  Either way - then we should make the goal of the project to
  support those
  old boot loaders. IMHO it should contain visibility. Doing
  theoretical stuff
  is just less fun for all parties. Or does that stuff work already?
 
  Mostly those old guests aged beyond usefulness.  They are still
  broken, but
  nobody installs new images.  Old images installed via workarounds
  work.
 
  Goals for this task could include:
 
   - get those older guests working
   - get emulate_invalid_guest_state=1 to work on all supported guests
   - switch to emulate_invalid_guest_state=1 as the default
   - drop the code supporting emulate_invalid_guest_state=0 eventually
 
  To this end I guess the next logical step is to compile a list of
  guests that are currently not working/work with hacks only, and get
  them working. Here are some suggestions:
  - MINIX 3.1.6 (developers have been recently filing bug reports
  because of boot failures)
  - Win XP with emulation enabled
  - FreeDOS with memory extenders
 
  Any other guests you'd like to see on this list?
 
 I remember old openSUSE iso bootloaders had issues. I think it was
 around 10.3, but might have been earlier.
 
At least 10u2 installer has trouble. I had spent some time on it, finally 
found it's due to ISOLINUX.

The basic issue is it assume that SS selector/base is unchanged when 
enter/exit protect mode. At that time, I've cooked a hack workaround for it, 
but didn't think it's proper to upstream.

-- 
regards
Yang, Sheng

[Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Daniel P. Berrange

On Mon, Mar 22, 2010 at 04:49:21PM -0500, Anthony Liguori wrote:
 On 03/22/2010 03:10 PM, Daniel P. Berrange wrote:
 This isn't necessarily libvirt's problem if it's mission is to provide a
 common hypervisor API that covers the most commonly used features.
  
 That is more or less our current mission. If this mission leads to QEMU
 creating a non-libvirt based API  telling people to use that instead,
 then I'd say libvirt's mission needs to change to avoid that scenario !
 I strongly believe that libvirt's strategy is good for application
 developers over the medium to long term. We need to figure out how to
 get rid of the short term pain from the feature timelag, rather than
 inventing a new library API for them to use.

 
 Well that's certainly a good thing :-)
 
 However, for qemu, we need an API that covers all of our features that
 people can develop against.  The ultimate question we need to figure out
 is, should we encourage our users to always use libvirt or should we
 build our own API for people (and libvirt) to consume.
 
 I don't think it's necessarily a big technical challenge for libvirt to
 support qemu more completely.  I think it amounts to introducing a
 series of virQemu APIs that implement qemu specific functions.  Over
 time, qemu specific APIs can be deprecated in favour of more generic
 virDomain APIs.
  
 Stepping back a bit first, there are the two core areas in which people can
 be limited by libvirt currently.
 
   1. Monitor commands
   2. Command line flags
 
 Ultimately, IIUC, you are suggesting we need to allow arbitrary passthrough
 for both of these in libvirt.
 
 At the libvirt level, we have 3 core requirements
 
   1. The XML format is extend only (new elements allowed, or add attributes
  or children to existing elements)
   2. The C library API is append only (new symbols only)
   3. The RPC wire protocol is append only (maps 1-1 to the C API generally)

 
 We have a slightly different mentality within QEMU I think.  Here's 
 roughly how I'd characterize our guarantees.
 
 1. For any two versions of QEMU, we try to guarantee that the same VM, 
 as far as the guest sees it, can be created.
 2. We tend to avoid changing command line syntax unless the syntax was 
 previously undefined.
 3. QMP supports enumeration and feature negotiation.  This enables a 
 client to discover which functions are supported.
 4. We try to maintain monitor interfaces but provide no guarantees of 
 compatibility.

Points 2  4 make it very hard for libvirt to use any library API
that QEMU might expose. We need to support multiple concurrently
running versions of QEMU on a host, to cope with the package upgrade
scenario  adhoc testing of new versions. If a libqemu.so for talking 
to QEMU changed a monitor interface  didn't have backwards compatability
for older QEMU version, then it is not something we could use, because 
any particular libvirt build would be tied to only being able to talk 
to the specific QEMU version. Currently we internally deal with changes 
in syntax detecting which format/protocol we need to use at runtime and
need to maintain that ability.

Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Daniel P. Berrange

On Wed, Mar 24, 2010 at 07:17:26AM +0200, Avi Kivity wrote:
 On 03/23/2010 08:00 PM, Avi Kivity wrote:
 On 03/23/2010 06:06 PM, Anthony Liguori wrote:
 I thought the monitor protocol *was* our API. If not, why not?
 
 It is.  But our API is missing key components like guest 
 enumeration.  So the fundamental topic here is, do we introduce these 
 missing components to allow people to build directly to our interface 
 or do we make use of the functionality that libvirt already provides 
 if they can plumb our API directly to users.
 
 
 Guest enumeration is another API.
 
 Over the kvm call I suggested a qemu concentrator that would keep 
 track of all running qemus, and would hand out monitor connections to 
 users.  It can do the enumeration (likely using qmp).  Libvirt could 
 talk to that, like it does with other hypervisors.
 
 
 To elaborate
 
 qemud
   - daemonaizes itself
   - listens on /var/lib/qemud/guests for incoming guest connections
   - listens on /var/lib/qemud/clients for incoming client connections
   - filters access according to uid (SCM_CREDENTIALS)
   - can pass a new monitor to client (SCM_RIGHTS)
   - supports 'list' command to query running guests
   - async messages on guest startup/exit

My concern is that once you provide this, then next someone wants it to
list inactive guests too. Once you list inactive guests, then you'll
want this to start a guest. Once you start guests then you want cgroups
integration, selinux labelling  so on, until it ends up replicating all
of libvirt's QEMU functionality.

To be able to use the list functionality from libvirt, we need this daemon
to also guarentee id, name  uuid uniqueness for all VMs, both running and
inactive, with separate namespaces for the system vs per-user lists. Or
we have to ignore any instances listed by qemud that were not started  by
libvirt, which rather defeats the purpose.

The filtering access part of this daemon is also not mapping well onto
libvirt's access model, because we don't soley filter based on UID in
libvirtd. We have it configurable based on UID, policykit, SASL, TLS/x509
already, and intend adding role based access control to further filter
things, integrating with the existing apparmour/selinux security models.
A qemud that filters based on UID only, gives users a side-channel to get
around libvirt's access control.

Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Avi Kivity


On 03/24/2010 12:36 PM, Daniel P. Berrange wrote:

On Wed, Mar 24, 2010 at 07:17:26AM +0200, Avi Kivity wrote:
   

On 03/23/2010 08:00 PM, Avi Kivity wrote:
 

On 03/23/2010 06:06 PM, Anthony Liguori wrote:
   

I thought the monitor protocol *was* our API. If not, why not?
   

It is.  But our API is missing key components like guest
enumeration.  So the fundamental topic here is, do we introduce these
missing components to allow people to build directly to our interface
or do we make use of the functionality that libvirt already provides
if they can plumb our API directly to users.

 

Guest enumeration is another API.

Over the kvm call I suggested a qemu concentrator that would keep
track of all running qemus, and would hand out monitor connections to
users.  It can do the enumeration (likely using qmp).  Libvirt could
talk to that, like it does with other hypervisors.

   

To elaborate

qemud
   - daemonaizes itself
   - listens on /var/lib/qemud/guests for incoming guest connections
   - listens on /var/lib/qemud/clients for incoming client connections
   - filters access according to uid (SCM_CREDENTIALS)
   - can pass a new monitor to client (SCM_RIGHTS)
   - supports 'list' command to query running guests
   - async messages on guest startup/exit
 

My concern is that once you provide this, then next someone wants it to
list inactive guests too.


That's impossible, since qemud doesn't manage config files or disk 
images.  It can't even launch guests!



Once you list inactive guests, then you'll
want this to start a guest. Once you start guests then you want cgroups
integration, selinux labelling  so on, until it ends up replicating all
of libvirt's QEMU functionality.

To be able to use the list functionality from libvirt, we need this daemon
to also guarentee id, name  uuid uniqueness for all VMs, both running and
inactive, with separate namespaces for the system vs per-user lists. Or
we have to ignore any instances listed by qemud that were not started  by
libvirt, which rather defeats the purpose.
   


qemud won't guarantee name uniqueness or provide uuids.


The filtering access part of this daemon is also not mapping well onto
libvirt's access model, because we don't soley filter based on UID in
libvirtd. We have it configurable based on UID, policykit, SASL, TLS/x509
already, and intend adding role based access control to further filter
things, integrating with the existing apparmour/selinux security models.
A qemud that filters based on UID only, gives users a side-channel to get
around libvirt's access control.
   


That's true.  Any time you write a multiplexer these issues crop up.  
Much better to stay in single process land where everything is already 
taken care of.


So, at best qemud is a toy for people who are annoyed by libvirt.

--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Richard Henderson


On 03/24/2010 02:47 AM, Paolo Bonzini wrote:

1) make CPUState define only common fields. Include CPUState at the
beginning of each per-target CPUXYZState.


Irritatingly, the common fields contain quite big TLBs.  And the
offsets from the start of env affect the compactness of the code
generated from TCG.  We really really want the general registers
to come first to make sure that those offsets fit the host's
reg+offset addressing mode.


r~

[Qemu-devel] Guest memory mapping in Qemu

2010-03-24 Thread Michael T

Hello,

This is an idle question in the sense that, much as I would like to, I know for
a
fact that I won't have the time to look at implementing this. I'm not expecting
other people to seriously look at doing it either, but I would be interested on
your
thoughts.

If the technical documentation at
http://www.usenix.org/publications/library/proceedings/usenix05/tech/freenix/full_papers/bellard/bellard_html/index.html
is still valid (I think it is), Qemu has two modes of handling access to guest
memory -
system emulation, in which an entire guest address space is mapped on the host,
and
emulated MMU. I was wondering whether something in-between would also be
feasible.
That is, chunks of guest address space (say 4MB chunks for the sake of the
argument)
are mmapped into the address space of the Qemu process on the host, and when an
access to guest memory is made, there is an initial check to see whether it is
in
the same chunk as the last one, in which case all the MMU emulation bits could
be
saved. I could imagine Qemu keeping a current/most recent chunk for each
register
which can be used for relative addressing, plus one for non-register-relative
accesses. It seems to me that this could potentially speed up memory access
quite a
bit, and as a bonus even make it easy to support x86 segmentation (as part of
the
bounds check for whether a memory access is in a chunk).

I realise of course that I have glibly glossed over all the nasty bits - off
the top
of my head keeping track of all the mapped chunks in the host address space,
lookups
to see if an access outside of the current chunk is inside another mapped one,
invalidating chunks when guest page tables they are based on change. I am sure
that
there are many more issues...

1 2 >

1 - 100 of 123 matches

Mail list logo