Re: [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Gerd Hoffmann

On 03/24/10 00:13, Jamie Lokier wrote:

Gerd Hoffmann wrote:

- networking: man, setting networking is a mess, libvirt just does it
   for you.


+1

Even when not using libvirt for a reason or another I usually hook my
virtual machines into virbr0 (libvirt default network).


I had the opposite problem.  Needed to use multiple bridges and have
some VMs behind NAT without a bridge (private IPs), and some using
separately firewalled bridges (needed to behave like real attached
hardware with their original MACs, but be firewalled).


No problem in theory.  libvirt should detect existing bridges and allow 
you to attach virtual machines to them.  So you can setup bridges and 
firewalling for them using usual distro tools and use them for virtual 
machines.


In practice I've seen this not working correctly in the past, i.e. my 
br0 didn't pop up in the virt-manager nic setup page.


cheers,
  Gerd





[Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Juan Quintela
Andi Kleen a...@firstfloor.org wrote:
 Juan Quintela quint...@redhat.com writes:

 - networking: man, setting networking is a mess, libvirt just does it
   for you.

 Agreed it's messy, but isn't this something that the standard qemu
 command line tool could potentially do better by itself? I don't see why you 
 need a wrapper for that.

In my case, basically it is MAC addresses.  I have dhcp setup, and it
always give the same IP to the same MAC.  But you have to remember to
type the MAC addresses.

This is the typical command line that virsh start launch for me:

/usr/libexec/qemu-kvm -S -M pc-0.12 -enable-kvm -m 1024 -smp
2,sockets=2,cores=1,threads=1 -name f12X-64 -uuid
1fbe73a6-f519-e848-03bd-6636f765d143 -nodefaults -chardev
socket,id=monitor,path=/var/lib/libvirt/qemu/f12X-64.monitor,server,nowait
-mon chardev=monitor,mode=readline -rtc base=utc -boot c -drive
file=/mnt/kvm/images/f12X-64.img,if=none,id=drive-virtio-disk0,boot=on,cache=none
-device
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
-device
virtio-net-pci,vlan=0,id=net0,mac=54:52:00:44:72:e6,bus=pci.0,addr=0x5
-net tap,fd=18,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device
isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc
127.0.0.1:0 -k es -vga cirrus -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3

From parts:

/usr/libexec/qemu-kvm
 -S

I don't want that
 -M pc-0.12

I don't care.

-enable-kvm

I _want_ :)
-m 1024

Also god idea

-smp 2,sockets=2,cores=1,threads=1

by hand it is always -smp 2
-name f12X-64

-uuid 1fbe73a6-f519-e848-03bd-6636f765d143

don't care

 -nodefaults

-chardev
socket,id=monitor,path=/var/lib/libvirt/qemu/f12X-64.monitor,server,nowait
-mon chardev=monitor,mode=readline

this is simplified as:
  -monitor stdio
when I launch it by hand.

 -rtc base=utc -boot c

don't care

-drive 
file=/mnt/kvm/images/f12X-64.img,if=none,id=drive-virtio-disk0,boot=on,cache=none
-device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0

this is _wow_, I only want to put the disk image path and convince it to
use virtio driver

-device virtio-net-pci,vlan=0,id=net0,mac=54:52:00:44:72:e6,bus=pci.0,addr=0x5
-net tap,fd=18,vlan=0,name=hostnet0

this always have to be changed. s/fd=18/script=/etc/kvm-ifup/
and then I normally found that I want downscript= to avoid the warning
at exit time.  If I don't put a mac address, qemu command line works
well, but as I normally also use vnc I have to:
- launch qemu
- kill it, relaunch with -vnc :0 instead of -vnc 127.0.0.1:0
- re-launch qemu
- connect to vnc
- check what address the dhcp server was giving to it this time
- I can ssh to the client now
with libvirt handling the command line, I just ssh to the same dhcp
address that was given the previous time/day/...

-chardev pty,id=serial0 -device isa-serial,chardev=serial0

I only use serial from time to time, and using -serial
tcp:0,server,nowait (or whatever is the sintax is easier by hand)

-usb -device usb-tablet,id=input0

usb tablet is mandatory, just in case the guest is able to _not_ grab
the mouse.

-vnc 127.0.0.1:0

Allways wrong in my case, because I want to run the vnc client in a
different machine.  a way to convince virt-viewer to connect to a qemu
launched by hand, or a way to convince libvirt to let me edit the
command line will be great.

 -k es -vga cirrus

this get right by default.

-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3

I normally don't use balloon.

Notice for the normally I don't care bits, that at the end I always
care.  Why?  because then somebody arrives and told me that sound don't
work, and I have to edit the config file, and add sound option.  add a
sound option to the command line of qemu is not too complicate.

The other big problem for me are snapshots,  I have to remember
_exactly_ what was the qemu command line with which I saved the
snapshot.  Guess what, I normally don't remember and end:

- launching old qemu
- save a new snapshot
- test with the new qemu and new snapshot (because now I have the
  command line that I launched 5 mins before).

Just in case it helps.

Later, Juan.




[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Juan Quintela
Blue Swirl blauwir...@gmail.com wrote:
 Hi,

 Here's some planning for getting most files compiled as few times as
 possible. Comments and suggestions are welcome.

I took some thought about this at some point.  Problems here start from
Recursive Makefile condered Harmful (tm).

Look at how we jump through hops to be able to compile things in
one/other side.

We have:
Makefile
Makefile.target (really lots of them, one for target)
Makefile.hw
Makefile.user

If we had only a single Makefile, things in this department would be
much, much easier. And no, convert to a single Makefile is not trivial
either, but it would make things easier.

Why do we have several Makefiles?  Because we want to compile each file
with different options.

Why do we need to abuse so much VPATH?  Because we need to bring files
randomly from $(ROOT), $(ROOT)/hw  $(ROOT)/$(TARGET).

Problem here, there isn't a simple way to compile files for several
target just once (no way to put them).

Our main copmile rule is:

$(QEMU_PROG): $(obj-y) $(obj-$(TARGET_BASE_ARCH)-y)
$(call LINK,$(obj-y) $(obj-$(TARGET_BASE_ARCH)-y))


(notice that things compiled in Makefile are trivial, they are already
compiled just once by definition, problems are for all the qemu's we
compile).

We could change: $(obj-$(TARGET_BASE_ARCH)-y) to something like:

OBJ-TARGET=s/.o/.$(TARGET_BASE_ARCH).o/

(I forgot the subst Makefile syntax), and have the:

%.$(TARGET_BASE_ARCH).o: %.c
   gcc $(TARGET_BASE_ARCH options)

From there, as you suggested, we need some files that are not compiled
by architecture, they need to be compiled by board, well, we need to add
yet another level obj-$(TARGET_BOARD) or whatever.

Notice that this is a lot of work, but you are needing the audit to be
able to compile only once.  Problem just now is that there is not a
simple way to describe that information,  with my proposal it gets
trivial to express:

obj-$(CONFIG_FOO) += foo.o  # You need this for everything
obj-mips-$(CONFIG_FOO) += foo.o  # You need this for all mips boards
obj-malta-$(CONFIG_FOO) += foo.o  # You need this for all mips malta board

You still need to do some different magic from hw-32/64 but it could be
done this way.  Once you did it this way, you now where the files are
(hw or target) and you can drop the VPATH tricks.

Problem with this proposal is that it is not trivial to do in little
steps, and the real big advantages appear when you switch to a single
Makefile at the end.

 vl.c: a lot of work. Maybe the CPUState stuff should be separated to a new 
 file.

That should just be a rule in Documentation.  You can't but anything
else in vl.c.  If you move anything out of vl.c (see timers work from
bonzini for example), you get a wild card for free commit bypassing
maintainers or some similar price :)

rest of files

I haven't really looked at them at depth.

I looked when I cleaned up the build system, I thought how to do the
next step (outlined before), but got sidetracked by other more urgent
things.

Later, Juan.




[Qemu-devel] Re: Exposing monitor on socket interface?

2010-03-24 Thread Juan Quintela
Jun Koi junkoi2...@gmail.com wrote:
 Hi,

 Is it possible to use -monitor option to expose the monitor on socket
 interface, such as TCP or Unix domain port, so I can access the
 monitor using non-stdio way?

man qemu

search -monitor

   -monitor dev
   Redirect the monitor to host device dev (same devices as the serial
   port).  The default device is vc in graphical mode and stdio in
   non graphical mode.

search -serial

  -serial dev
   Redirect the virtual serial port to host character device dev. The
   default device is vc in graphical mode and stdio in non
   graphical mode.

   This option can be used several times to simulate up to 4 serial
   ports.

   Use -serial none to disable all serial ports.

   Available character devices are:


  tcp:[host]:port[,server][,nowait][,nodelay]
   The TCP Net Console has two modes of operation.  It can send
   the serial I/O to a location or wait for a connection from a
   location.  By default the TCP Net Console is sent to host at
   the port.  If you use the server option QEMU will wait for a
   client socket application to connect to the port before
   continuing, unless the nowait option was specified.  The
   nodelay option disables the Nagle buffering algorithm.  If
   host is omitted, 0.0.0.0 is assumed. Only one TCP connection at
   a time is accepted. You can use telnet to connect to the
   corresponding character device.

   Example to send tcp console to 192.168.0.2 port 
   -serial tcp:192.168.0.2:

   Example to listen and wait on port  for connection
   -serial tcp::,server

   Example to not wait and listen on ip 192.168.0.100 port 
   -serial tcp:192.168.0.100:,server,nowait

   telnet:host:port[,server][,nowait][,nodelay]
   The telnet protocol is used instead of raw tcp sockets.  The
   options work the same as if you had specified -serial tcp.
   The difference is that the port acts like a telnet server or
   client using telnet option negotiation.  This will also allow
   you to send the MAGIC_SYSRQ sequence if you use a telnet that
   supports sending the break sequence.  Typically in unix telnet
   you do it with Control-] and then type send break followed by
   pressing the enter key.


I think that it is difficult to get more options that qemu in that
department :-)

Later, Juan.




[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Paolo Bonzini



The harder cases are those where the device code depends somehow on
the architecture. Some thoughts follow.

vl.c: a lot of work. Maybe the CPUState stuff should be separated to a new file.

dma.c: DMA_schedule needs access to CPUState.


Most users of CPUState (e.g. qemu-timer.c and hw/dma.c) either need it 
as an opaque pointer, or only need access to target-independent stuff. 
So you could:


1) make CPUState define only common fields.  Include CPUState at the 
beginning of each per-target CPUXYZState.


2) Do s/CPUState/CPUXYZState/ on target-*/*.

3) Make it compile, possibly by undoing parts of 2) and changing parts 
of it to DO_UPCAST.


Paolo




[Qemu-devel] Re: [RFC] vhost-blk implementation

2010-03-24 Thread Michael S. Tsirkin
On Tue, Mar 23, 2010 at 12:55:07PM -0700, Badari Pulavarty wrote:
 Michael S. Tsirkin wrote:
 On Tue, Mar 23, 2010 at 10:57:33AM -0700, Badari Pulavarty wrote:
   
 Michael S. Tsirkin wrote:
 
 On Mon, Mar 22, 2010 at 05:34:04PM -0700, Badari Pulavarty wrote:
 
 Write Results:
 ==

 I see degraded IO performance when doing sequential IO write
 tests with vhost-blk compared to virtio-blk.

 # time dd of=/dev/vda if=/dev/zero bs=2M oflag=direct

 I get ~110MB/sec with virtio-blk, but I get only ~60MB/sec with
 vhost-blk. Wondering why ?
 
 Try to look and number of interrupts and/or number of exits.
 
 I checked interrupts and IO exits - there is no major noticeable   
 difference between
 vhost-blk and virtio-blk scenerios.
 
 It could also be that you are overrunning some queue.

 I don't see any exit mitigation strategy in your patch:
 when there are already lots of requests in a queue, it's usually
 a good idea to disable notifications and poll the
 queue as requests complete. That could help performance.
 
 Do you mean poll eventfd for new requests instead of waiting for new  
 notifications ?
 Where do you do that in vhost-net code ?
 

 vhost_disable_notify does this.

   
 Unlike network socket, since we are dealing with a file, there is no  
 -poll support for it.
 So I can't poll for the data. And also, Issue I am having is on the   
 write() side.
 

 Not sure I understand.

   
 I looked at it some more - I see 512K write requests on the
 virtio-queue  in both vhost-blk and virtio-blk cases. Both qemu or
 vhost is doing synchronous  writes to page cache (there is no write
 batching in qemu that is affecting this  case).  I still puzzled on
 why virtio-blk outperforms vhost-blk.

 Thanks,
 Badari
 

 If you say the number of requests is the same, we are left with:
 - requests are smaller for some reason?
 - something is causing retries?
   
 No. IO requests sizes are exactly same (512K) in both cases. There are  
 no retries or
 errors in both cases. One thing I am not clear is - for some reason  
 guest kernel
 could push more data into virtio-ring in case of virtio-blk vs  
 vhost-blk. Is this possible ?
 Does guest gets to run much sooner in virtio-blk case than vhost-blk ?  
 Sorry, if its dumb question -
 I don't understand  all the vhost details :(

 Thanks,
 Badari


You said you observed same number of requests in userspace versus kernel above.
And request size is the same as well. But somehow more data is
transferred? I'm confused.

-- 
MST




[Qemu-devel] Re: Exposing monitor on socket interface?

2010-03-24 Thread Jun Koi
Thanks a lot, Juan!

Jun

On Wed, Mar 24, 2010 at 6:41 PM, Juan Quintela quint...@redhat.com wrote:
 Jun Koi junkoi2...@gmail.com wrote:
 Hi,

 Is it possible to use -monitor option to expose the monitor on socket
 interface, such as TCP or Unix domain port, so I can access the
 monitor using non-stdio way?

 man qemu

 search -monitor

       -monitor dev
           Redirect the monitor to host device dev (same devices as the serial
           port).  The default device is vc in graphical mode and stdio in
           non graphical mode.

 search -serial

      -serial dev
           Redirect the virtual serial port to host character device dev. The
           default device is vc in graphical mode and stdio in non
           graphical mode.

           This option can be used several times to simulate up to 4 serial
           ports.

           Use -serial none to disable all serial ports.

           Available character devices are:

 
          tcp:[host]:port[,server][,nowait][,nodelay]
               The TCP Net Console has two modes of operation.  It can send
               the serial I/O to a location or wait for a connection from a
               location.  By default the TCP Net Console is sent to host at
               the port.  If you use the server option QEMU will wait for a
               client socket application to connect to the port before
               continuing, unless the nowait option was specified.  The
               nodelay option disables the Nagle buffering algorithm.  If
               host is omitted, 0.0.0.0 is assumed. Only one TCP connection at
               a time is accepted. You can use telnet to connect to the
               corresponding character device.

               Example to send tcp console to 192.168.0.2 port 
                   -serial tcp:192.168.0.2:

               Example to listen and wait on port  for connection
                   -serial tcp::,server

               Example to not wait and listen on ip 192.168.0.100 port 
                   -serial tcp:192.168.0.100:,server,nowait

           telnet:host:port[,server][,nowait][,nodelay]
               The telnet protocol is used instead of raw tcp sockets.  The
               options work the same as if you had specified -serial tcp.
               The difference is that the port acts like a telnet server or
               client using telnet option negotiation.  This will also allow
               you to send the MAGIC_SYSRQ sequence if you use a telnet that
               supports sending the break sequence.  Typically in unix telnet
               you do it with Control-] and then type send break followed by
               pressing the enter key.


 I think that it is difficult to get more options that qemu in that
 department :-)

 Later, Juan.





[Qemu-devel] Re: Completing big real mode emulation

2010-03-24 Thread Sheng Yang
On Saturday 20 March 2010 23:00:49 Alexander Graf wrote:
 Am 20.03.2010 um 15:02 schrieb Mohammed Gamal m.gamal...@gmail.com:
  On Sat, Mar 20, 2010 at 3:18 PM, Avi Kivity a...@redhat.com wrote:
  On 03/20/2010 10:55 AM, Alexander Graf wrote:
  I'd say that a GSoC project would rather focus on making a guest
  OS work
  than working on generic big real mode. Having Windows 98 support
  is way more
  visible to the users. And hopefully more fun to implement too,
  as it's a
  visible goal :-).
 
  Big real mode allows you to boot various OSes, such as that old
  Ubuntu/SuSE boot loader which triggered the whole thing.
 
  I thought legacy Windows uses it too?
 
  IIRC even current Windows (last I checked was XP, but it's probably
  true for
  newer) invokes big real mode inadvertently.  All it takes is not to
  clear fs
  and gs while switching to real mode.  It works because the real
  mode code
  never uses gs and fs (i.e. while we are technically in big real
  mode, the
  guest never relies on this), and because there are enough hacks in
  vmx.c to
  make it work (restoring fs and gs after the switch back).  IIRC
  there are
  other cases of invalid guest state that we hack into place during
  mode
  switches.
 
  Either way - then we should make the goal of the project to
  support those
  old boot loaders. IMHO it should contain visibility. Doing
  theoretical stuff
  is just less fun for all parties. Or does that stuff work already?
 
  Mostly those old guests aged beyond usefulness.  They are still
  broken, but
  nobody installs new images.  Old images installed via workarounds
  work.
 
  Goals for this task could include:
 
   - get those older guests working
   - get emulate_invalid_guest_state=1 to work on all supported guests
   - switch to emulate_invalid_guest_state=1 as the default
   - drop the code supporting emulate_invalid_guest_state=0 eventually
 
  To this end I guess the next logical step is to compile a list of
  guests that are currently not working/work with hacks only, and get
  them working. Here are some suggestions:
  - MINIX 3.1.6 (developers have been recently filing bug reports
  because of boot failures)
  - Win XP with emulation enabled
  - FreeDOS with memory extenders
 
  Any other guests you'd like to see on this list?
 
 I remember old openSUSE iso bootloaders had issues. I think it was
 around 10.3, but might have been earlier.
 
At least 10u2 installer has trouble. I had spent some time on it, finally 
found it's due to ISOLINUX.

The basic issue is it assume that SS selector/base is unchanged when 
enter/exit protect mode. At that time, I've cooked a hack workaround for it, 
but didn't think it's proper to upstream.

-- 
regards
Yang, Sheng




[Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Daniel P. Berrange
On Mon, Mar 22, 2010 at 04:49:21PM -0500, Anthony Liguori wrote:
 On 03/22/2010 03:10 PM, Daniel P. Berrange wrote:
 This isn't necessarily libvirt's problem if it's mission is to provide a
 common hypervisor API that covers the most commonly used features.
  
 That is more or less our current mission. If this mission leads to QEMU
 creating a non-libvirt based API  telling people to use that instead,
 then I'd say libvirt's mission needs to change to avoid that scenario !
 I strongly believe that libvirt's strategy is good for application
 developers over the medium to long term. We need to figure out how to
 get rid of the short term pain from the feature timelag, rather than
 inventing a new library API for them to use.

 
 Well that's certainly a good thing :-)
 
 However, for qemu, we need an API that covers all of our features that
 people can develop against.  The ultimate question we need to figure out
 is, should we encourage our users to always use libvirt or should we
 build our own API for people (and libvirt) to consume.
 
 I don't think it's necessarily a big technical challenge for libvirt to
 support qemu more completely.  I think it amounts to introducing a
 series of virQemu APIs that implement qemu specific functions.  Over
 time, qemu specific APIs can be deprecated in favour of more generic
 virDomain APIs.
  
 Stepping back a bit first, there are the two core areas in which people can
 be limited by libvirt currently.
 
   1. Monitor commands
   2. Command line flags
 
 Ultimately, IIUC, you are suggesting we need to allow arbitrary passthrough
 for both of these in libvirt.
 
 At the libvirt level, we have 3 core requirements
 
   1. The XML format is extend only (new elements allowed, or add attributes
  or children to existing elements)
   2. The C library API is append only (new symbols only)
   3. The RPC wire protocol is append only (maps 1-1 to the C API generally)

 
 We have a slightly different mentality within QEMU I think.  Here's 
 roughly how I'd characterize our guarantees.
 
 1. For any two versions of QEMU, we try to guarantee that the same VM, 
 as far as the guest sees it, can be created.
 2. We tend to avoid changing command line syntax unless the syntax was 
 previously undefined.
 3. QMP supports enumeration and feature negotiation.  This enables a 
 client to discover which functions are supported.
 4. We try to maintain monitor interfaces but provide no guarantees of 
 compatibility.

Points 2  4 make it very hard for libvirt to use any library API
that QEMU might expose. We need to support multiple concurrently
running versions of QEMU on a host, to cope with the package upgrade
scenario  adhoc testing of new versions. If a libqemu.so for talking 
to QEMU changed a monitor interface  didn't have backwards compatability
for older QEMU version, then it is not something we could use, because 
any particular libvirt build would be tied to only being able to talk 
to the specific QEMU version. Currently we internally deal with changes 
in syntax detecting which format/protocol we need to use at runtime and
need to maintain that ability.

Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|




Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Daniel P. Berrange
On Wed, Mar 24, 2010 at 07:17:26AM +0200, Avi Kivity wrote:
 On 03/23/2010 08:00 PM, Avi Kivity wrote:
 On 03/23/2010 06:06 PM, Anthony Liguori wrote:
 I thought the monitor protocol *was* our API. If not, why not?
 
 It is.  But our API is missing key components like guest 
 enumeration.  So the fundamental topic here is, do we introduce these 
 missing components to allow people to build directly to our interface 
 or do we make use of the functionality that libvirt already provides 
 if they can plumb our API directly to users.
 
 
 Guest enumeration is another API.
 
 Over the kvm call I suggested a qemu concentrator that would keep 
 track of all running qemus, and would hand out monitor connections to 
 users.  It can do the enumeration (likely using qmp).  Libvirt could 
 talk to that, like it does with other hypervisors.
 
 
 To elaborate
 
 qemud
   - daemonaizes itself
   - listens on /var/lib/qemud/guests for incoming guest connections
   - listens on /var/lib/qemud/clients for incoming client connections
   - filters access according to uid (SCM_CREDENTIALS)
   - can pass a new monitor to client (SCM_RIGHTS)
   - supports 'list' command to query running guests
   - async messages on guest startup/exit

My concern is that once you provide this, then next someone wants it to
list inactive guests too. Once you list inactive guests, then you'll
want this to start a guest. Once you start guests then you want cgroups
integration, selinux labelling  so on, until it ends up replicating all
of libvirt's QEMU functionality.

To be able to use the list functionality from libvirt, we need this daemon
to also guarentee id, name  uuid uniqueness for all VMs, both running and
inactive, with separate namespaces for the system vs per-user lists. Or
we have to ignore any instances listed by qemud that were not started  by
libvirt, which rather defeats the purpose.

The filtering access part of this daemon is also not mapping well onto
libvirt's access model, because we don't soley filter based on UID in
libvirtd. We have it configurable based on UID, policykit, SASL, TLS/x509
already, and intend adding role based access control to further filter
things, integrating with the existing apparmour/selinux security models.
A qemud that filters based on UID only, gives users a side-channel to get
around libvirt's access control.

Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|




Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Avi Kivity

On 03/24/2010 12:36 PM, Daniel P. Berrange wrote:

On Wed, Mar 24, 2010 at 07:17:26AM +0200, Avi Kivity wrote:
   

On 03/23/2010 08:00 PM, Avi Kivity wrote:
 

On 03/23/2010 06:06 PM, Anthony Liguori wrote:
   

I thought the monitor protocol *was* our API. If not, why not?
   

It is.  But our API is missing key components like guest
enumeration.  So the fundamental topic here is, do we introduce these
missing components to allow people to build directly to our interface
or do we make use of the functionality that libvirt already provides
if they can plumb our API directly to users.

 

Guest enumeration is another API.

Over the kvm call I suggested a qemu concentrator that would keep
track of all running qemus, and would hand out monitor connections to
users.  It can do the enumeration (likely using qmp).  Libvirt could
talk to that, like it does with other hypervisors.

   

To elaborate

qemud
   - daemonaizes itself
   - listens on /var/lib/qemud/guests for incoming guest connections
   - listens on /var/lib/qemud/clients for incoming client connections
   - filters access according to uid (SCM_CREDENTIALS)
   - can pass a new monitor to client (SCM_RIGHTS)
   - supports 'list' command to query running guests
   - async messages on guest startup/exit
 

My concern is that once you provide this, then next someone wants it to
list inactive guests too.


That's impossible, since qemud doesn't manage config files or disk 
images.  It can't even launch guests!



Once you list inactive guests, then you'll
want this to start a guest. Once you start guests then you want cgroups
integration, selinux labelling  so on, until it ends up replicating all
of libvirt's QEMU functionality.

To be able to use the list functionality from libvirt, we need this daemon
to also guarentee id, name  uuid uniqueness for all VMs, both running and
inactive, with separate namespaces for the system vs per-user lists. Or
we have to ignore any instances listed by qemud that were not started  by
libvirt, which rather defeats the purpose.
   


qemud won't guarantee name uniqueness or provide uuids.


The filtering access part of this daemon is also not mapping well onto
libvirt's access model, because we don't soley filter based on UID in
libvirtd. We have it configurable based on UID, policykit, SASL, TLS/x509
already, and intend adding role based access control to further filter
things, integrating with the existing apparmour/selinux security models.
A qemud that filters based on UID only, gives users a side-channel to get
around libvirt's access control.
   


That's true.  Any time you write a multiplexer these issues crop up.  
Much better to stay in single process land where everything is already 
taken care of.


So, at best qemud is a toy for people who are annoyed by libvirt.

--
error compiling committee.c: too many arguments to function





Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Richard Henderson

On 03/24/2010 02:47 AM, Paolo Bonzini wrote:

1) make CPUState define only common fields. Include CPUState at the
beginning of each per-target CPUXYZState.


Irritatingly, the common fields contain quite big TLBs.  And the
offsets from the start of env affect the compactness of the code
generated from TCG.  We really really want the general registers
to come first to make sure that those offsets fit the host's
reg+offset addressing mode.


r~




[Qemu-devel] Guest memory mapping in Qemu

2010-03-24 Thread Michael T

Hello,

This is an idle question in the sense that, much as I would like to, I know for 
a
fact that I won't have the time to look at implementing this.  I'm not expecting
other people to seriously look at doing it either, but I would be interested on 
your
thoughts.

If the technical documentation at
http://www.usenix.org/publications/library/proceedings/usenix05/tech/freenix/full_papers/bellard/bellard_html/index.html
is still valid (I think it is), Qemu has two modes of handling access to guest 
memory -
system emulation, in which an entire guest address space is mapped on the host, 
and
emulated MMU.  I was wondering whether something in-between would also be 
feasible.
That is, chunks of guest address space (say 4MB chunks for the sake of the 
argument)
are mmapped into the address space of the Qemu process on the host, and when an
access to guest memory is made, there is an initial check to see whether it is 
in
the same chunk as the last one, in which case all the MMU emulation bits could 
be
saved.  I could imagine Qemu keeping a current/most recent chunk for each 
register
which can be used for relative addressing, plus one for non-register-relative
accesses.  It seems to me that this could potentially speed up memory access 
quite a
bit, and as a bonus even make it easy to support x86 segmentation (as part of 
the
bounds check for whether a memory access is in a chunk).

I realise of course that I have glibly glossed over all the nasty bits - off 
the top
of my head keeping track of all the mapped chunks in the host address space, 
lookups
to see if an access outside of the current chunk is inside another mapped one,
invalidating chunks when guest page tables they are based on change.  I am sure 
that
there are many more issues...

Looking forward to reading any responses.

Regards,

Michael
  
_
Hotmail: Trusted email with Microsoft’s powerful SPAM protection.
https://signup.live.com/signup.aspx?id=60969



[Qemu-devel] Re: [RFC] vhost-blk implementation

2010-03-24 Thread Michael S. Tsirkin
On Tue, Mar 23, 2010 at 12:55:07PM -0700, Badari Pulavarty wrote:
 Michael S. Tsirkin wrote:
 On Tue, Mar 23, 2010 at 10:57:33AM -0700, Badari Pulavarty wrote:
   
 Michael S. Tsirkin wrote:
 
 On Mon, Mar 22, 2010 at 05:34:04PM -0700, Badari Pulavarty wrote:
 
 Write Results:
 ==

 I see degraded IO performance when doing sequential IO write
 tests with vhost-blk compared to virtio-blk.

 # time dd of=/dev/vda if=/dev/zero bs=2M oflag=direct

 I get ~110MB/sec with virtio-blk, but I get only ~60MB/sec with
 vhost-blk. Wondering why ?
 
 Try to look and number of interrupts and/or number of exits.
 
 I checked interrupts and IO exits - there is no major noticeable   
 difference between
 vhost-blk and virtio-blk scenerios.
 
 It could also be that you are overrunning some queue.

 I don't see any exit mitigation strategy in your patch:
 when there are already lots of requests in a queue, it's usually
 a good idea to disable notifications and poll the
 queue as requests complete. That could help performance.
 
 Do you mean poll eventfd for new requests instead of waiting for new  
 notifications ?
 Where do you do that in vhost-net code ?
 

 vhost_disable_notify does this.

   
 Unlike network socket, since we are dealing with a file, there is no  
 -poll support for it.
 So I can't poll for the data. And also, Issue I am having is on the   
 write() side.
 

 Not sure I understand.

   
 I looked at it some more - I see 512K write requests on the
 virtio-queue  in both vhost-blk and virtio-blk cases. Both qemu or
 vhost is doing synchronous  writes to page cache (there is no write
 batching in qemu that is affecting this  case).  I still puzzled on
 why virtio-blk outperforms vhost-blk.

 Thanks,
 Badari
 

 If you say the number of requests is the same, we are left with:
 - requests are smaller for some reason?
 - something is causing retries?
   
 No. IO requests sizes are exactly same (512K) in both cases. There are  
 no retries or
 errors in both cases. One thing I am not clear is - for some reason  
 guest kernel
 could push more data into virtio-ring in case of virtio-blk vs  
 vhost-blk. Is this possible ?
 Does guest gets to run much sooner in virtio-blk case than vhost-blk ?  
 Sorry, if its dumb question -
 I don't understand  all the vhost details :(

 Thanks,
 Badari

BTW, did you put the backend in non-blocking mode?
As I said, vhost net passes non-blocking flag to
socket backend, but vfs_write/read that you use does
not have an option to do this.

So we'll need to extend the backend to fix that,
but just for demo purposes, you could set non-blocking
mode on the file from userspace.

-- 
MST




[Qemu-devel] [PATCH][RESEND] qemu: jaso-parser: Output the content of invalid keyword

2010-03-24 Thread Amos Kong
When input some invialid words in QMP port, qemu outputs this error message:
parse error: invalid keyword `%s'
This patch makes qemu output the content, like:
parse error: invalid keyword `unknow_cmd'

Signed-off-by: Amos Kong ak...@redhat.com
---
 json-parser.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/json-parser.c b/json-parser.c
index 579928f..98a82af 100644
--- a/json-parser.c
+++ b/json-parser.c
@@ -12,6 +12,7 @@
  */
 
 #include stdbool.h
+#include stdarg.h
 
 #include qemu-common.h
 #include qstring.h
@@ -93,7 +94,11 @@ static int token_is_escape(QObject *obj, const char *value)
  */
 static void parse_error(JSONParserContext *ctxt, QObject *token, const char 
*msg, ...)
 {
-fprintf(stderr, parse error: %s\n, msg);
+va_list ap;
+va_start(ap, msg);
+fprintf(stderr, parse error: );
+vfprintf(stderr, msg, ap);
+fprintf(stderr, \n);
 }
 
 /**
-- 
1.5.5.6





Re: [Qemu-devel] [PATCH][RESEND] qemu: jaso-parser: Output the content of invalid keyword

2010-03-24 Thread Richard Henderson

On 03/24/2010 05:17 AM, Amos Kong wrote:

-fprintf(stderr, parse error: %s\n, msg);
+va_list ap;
+va_start(ap, msg);
+fprintf(stderr, parse error: );
+vfprintf(stderr, msg, ap);
+fprintf(stderr, \n);


Technically you need va_end here.


r~




Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Anthony Liguori

On 03/24/2010 05:42 AM, Avi Kivity wrote:



The filtering access part of this daemon is also not mapping well onto
libvirt's access model, because we don't soley filter based on UID in
libvirtd. We have it configurable based on UID, policykit, SASL, 
TLS/x509

already, and intend adding role based access control to further filter
things, integrating with the existing apparmour/selinux security models.
A qemud that filters based on UID only, gives users a side-channel to 
get

around libvirt's access control.


That's true.  Any time you write a multiplexer these issues crop up.  
Much better to stay in single process land where everything is already 
taken care of.


What does a multiplexer give you that making individual qemu instances 
discoverable doesn't give you?  The later doesn't suffer from these 
problems.


Regards,

Anthony Liguori


So, at best qemud is a toy for people who are annoyed by libvirt.







Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Paul Brook
 I can't quite see what such a libqemu would buy us compared to straight
 QMP.
 
 Talking QMP should be easy, provided you got a suitable JSON library.

I agree. My undesranding is this was one of the large motivations behind using 
JSON: It's a common protocol that already has convenient bindings in most 
languages.  If it's hard[1] for third parties to bind QMP to their favourite 
language/framework then IMHO we've done it wrong.

Paul

[1] Hard compared to any other sane RPC mechanism. Some languages make 
everything hard :-)




Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Avi Kivity

On 03/24/2010 02:19 PM, Anthony Liguori wrote:

qemud
  - daemonaizes itself
  - listens on /var/lib/qemud/guests for incoming guest connections
  - listens on /var/lib/qemud/clients for incoming client connections
  - filters access according to uid (SCM_CREDENTIALS)
  - can pass a new monitor to client (SCM_RIGHTS)
  - supports 'list' command to query running guests
  - async messages on guest startup/exit



Then guests run with the wrong security context.


Why?  They run with the security context of whoever launched them (could 
be libvirtd).


--
error compiling committee.c: too many arguments to function





Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Avi Kivity

On 03/24/2010 02:30 PM, Anthony Liguori wrote:

On 03/24/2010 07:27 AM, Avi Kivity wrote:

On 03/24/2010 02:19 PM, Anthony Liguori wrote:

qemud
  - daemonaizes itself
  - listens on /var/lib/qemud/guests for incoming guest connections
  - listens on /var/lib/qemud/clients for incoming client connections
  - filters access according to uid (SCM_CREDENTIALS)
  - can pass a new monitor to client (SCM_RIGHTS)
  - supports 'list' command to query running guests
  - async messages on guest startup/exit



Then guests run with the wrong security context.


Why?  They run with the security context of whoever launched them 
(could be libvirtd).


Because it doesn't have the same security context as qemud and since 
clients have to connect to qemud, qemud has to implement access control.


Yeah.

It's far better to have the qemu instance advertise itself such that 
and client connects directly to it.  Then all of the various 
authorization models will be applied correctly to it.


Agreed.  qemud-exit().

--
error compiling committee.c: too many arguments to function





Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Avi Kivity

On 03/24/2010 02:32 PM, Anthony Liguori wrote:
You don't get a directory filled with a zillion socket files pointing 
at dead guests.  Agree that's a poor return on investment.



Deleting it on atexit combined with flushing the whole directory at 
startup is a pretty reasonable solution to this (which is ultimately 
how the entirety of /var/run behaves).


If you're really paranoid, you can fork() a helper with a shared pipe 
to implement unlink on close.


My paranoia comes nowhere near to my dislike of forked helpers.

--
error compiling committee.c: too many arguments to function





Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Anthony Liguori

On 03/24/2010 07:29 AM, Avi Kivity wrote:

On 03/24/2010 02:23 PM, Anthony Liguori wrote:

On 03/24/2010 05:42 AM, Avi Kivity wrote:



The filtering access part of this daemon is also not mapping well onto
libvirt's access model, because we don't soley filter based on UID in
libvirtd. We have it configurable based on UID, policykit, SASL, 
TLS/x509

already, and intend adding role based access control to further filter
things, integrating with the existing apparmour/selinux security 
models.
A qemud that filters based on UID only, gives users a side-channel 
to get

around libvirt's access control.


That's true.  Any time you write a multiplexer these issues crop 
up.  Much better to stay in single process land where everything is 
already taken care of.


What does a multiplexer give you that making individual qemu 
instances discoverable doesn't give you?  The later doesn't suffer 
from these problems.




You don't get a directory filled with a zillion socket files pointing 
at dead guests.  Agree that's a poor return on investment.


Deleting it on atexit combined with flushing the whole directory at 
startup is a pretty reasonable solution to this (which is ultimately how 
the entirety of /var/run behaves).


If you're really paranoid, you can fork() a helper with a shared pipe to 
implement unlink on close.


Regards,

Anthony Liguori

Maybe we want a O_UNLINK_ON_CLOSE for unix domain sockets - but no, 
that's not implementable.








Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Paul Brook
 On 03/23/2010 09:24 PM, Anthony Liguori wrote:
  We also provide an API for guest creation (the qemu command line).
 
 As an aside, I'd like to see all command line options have qmp
 equivalents (most of them can be implemented with a 'set' command that
 writes qdev values).  This allows a uniform way to control a guest,
 whether at startup or runtime.  You start with a case, cold-plug a
 motherboard, cpus, memory, disk controllers, and power it on.

The main blocker to this is converting all the devices to qdev. partial 
conversions are not sufficient. It's approximately the same problem as a 
machine config file. If you have one then the other should be fairly trivial.
IMO the no_user flag is a bug, and should not exist.

Paul




Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Anthony Liguori

On 03/24/2010 07:27 AM, Avi Kivity wrote:

On 03/24/2010 02:19 PM, Anthony Liguori wrote:

qemud
  - daemonaizes itself
  - listens on /var/lib/qemud/guests for incoming guest connections
  - listens on /var/lib/qemud/clients for incoming client connections
  - filters access according to uid (SCM_CREDENTIALS)
  - can pass a new monitor to client (SCM_RIGHTS)
  - supports 'list' command to query running guests
  - async messages on guest startup/exit



Then guests run with the wrong security context.


Why?  They run with the security context of whoever launched them 
(could be libvirtd).


Because it doesn't have the same security context as qemud and since 
clients have to connect to qemud, qemud has to implement access control.


It's far better to have the qemu instance advertise itself such that and 
client connects directly to it.  Then all of the various authorization 
models will be applied correctly to it.


Regards,

Anthony Liguori





Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Avi Kivity

On 03/24/2010 02:30 PM, Paul Brook wrote:

On 03/23/2010 09:24 PM, Anthony Liguori wrote:
 

We also provide an API for guest creation (the qemu command line).
   

As an aside, I'd like to see all command line options have qmp
equivalents (most of them can be implemented with a 'set' command that
writes qdev values).  This allows a uniform way to control a guest,
whether at startup or runtime.  You start with a case, cold-plug a
motherboard, cpus, memory, disk controllers, and power it on.
 

The main blocker to this is converting all the devices to qdev. partial
conversions are not sufficient. It's approximately the same problem as a
machine config file. If you have one then the other should be fairly trivial.
   


Agreed.


IMO the no_user flag is a bug, and should not exist.
   


Sorry, what's that?

--
error compiling committee.c: too many arguments to function





Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Avi Kivity

On 03/24/2010 02:23 PM, Anthony Liguori wrote:

On 03/24/2010 05:42 AM, Avi Kivity wrote:



The filtering access part of this daemon is also not mapping well onto
libvirt's access model, because we don't soley filter based on UID in
libvirtd. We have it configurable based on UID, policykit, SASL, 
TLS/x509

already, and intend adding role based access control to further filter
things, integrating with the existing apparmour/selinux security 
models.
A qemud that filters based on UID only, gives users a side-channel 
to get

around libvirt's access control.


That's true.  Any time you write a multiplexer these issues crop up.  
Much better to stay in single process land where everything is 
already taken care of.


What does a multiplexer give you that making individual qemu instances 
discoverable doesn't give you?  The later doesn't suffer from these 
problems.




You don't get a directory filled with a zillion socket files pointing at 
dead guests.  Agree that's a poor return on investment.


Maybe we want a O_UNLINK_ON_CLOSE for unix domain sockets - but no, 
that's not implementable.


--
error compiling committee.c: too many arguments to function





Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Anthony Liguori

On 03/24/2010 07:25 AM, Paul Brook wrote:

I can't quite see what such a libqemu would buy us compared to straight
QMP.

Talking QMP should be easy, provided you got a suitable JSON library.
 

I agree. My undesranding is this was one of the large motivations behind using
JSON: It's a common protocol that already has convenient bindings in most
languages.  If it's hard[1] for third parties to bind QMP to their favourite
language/framework then IMHO we've done it wrong.
   


You can't have convenient bindings to an RPC in C because it doesn't 
support dynamic dispatch.  With most types of RPC, you have an IDL 
description and a code generator.


But regardless of that, there are advantages to us providing a libqemu.  
The biggest one is that we can standardize transport implementations 
that include discovery mechanisms.


If the core of libqemu provided an extensible transport interface, and a 
generic QMP request/completion mechanism, in a Python binding, you would 
never use the IDL generated wrappers but instead use dynamic dispatch to 
invoke arbitrary QMP requests.


But the advantage is that if libvirt provided an API for a QMP transport 
encapsulated in their secure protocol, then provided the plumbed that 
API through their Python interface, you could use it for free in Python 
without having to reinvent the wheel.


Regards,

Anthony Liguori


Paul

[1] Hard compared to any other sane RPC mechanism. Some languages make
everything hard :-)
   






Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Paul Brook
  IMO the no_user flag is a bug, and should not exist.
 
 Sorry, what's that?

Usually an indication that a device has been incorrectly or inproperly 
converted to the qdev interface.

Paul




Re: [Qemu-devel] Guest memory mapping in Qemu

2010-03-24 Thread Paul Brook
 If the technical documentation at
 http://www.usenix.org/publications/library/proceedings/usenix05/tech/freeni
 x/full_papers/bellard/bellard_html/index.html is still valid (I think it
  is), Qemu has two modes of handling access to guest memory - system
  emulation, in which an entire guest address space is mapped on the host,
  and emulated MMU.  

No. qemu-fast (using the host address space) was removed long ago. There are a 
few stray remnants, but nothing useful. We always use an emulated MMU.

  I was wondering whether something in-between would also
  be feasible. That is, chunks of guest address space (say 4MB chunks for
  the sake of the argument) are mmapped into the address space of the Qemu
  process on the host, and when an access to guest memory is made, there is
  an initial check to see whether it is in the same chunk as the last one,
  in which case all the MMU emulation bits could be saved.  I could imagine
  Qemu keeping a current/most recent chunk for each register which can be
  used for relative addressing, plus one for non-register-relative accesses.
   It seems to me that this could potentially speed up memory access quite a
  bit, and as a bonus even make it easy to support x86 segmentation (as part
  of the bounds check for whether a memory access is in a chunk).

This is effectively shadow paging implemented in userspace via mmap. It's very 
hard to make it work in a sane way, and even harder to make it go fast. TLB 
handling is already a significant bottleneck for many tasks, adding a mmap 
call is likely to make this orders of magnitude worse.  Most guests use 
virtual memory extensively, so the virtual-physical mappings tend to be 
extremely fragmented.

If you really want to do shadow paging for cross environments, you probably 
need to move it into kernel space. Either as a host kernel module, or as a 
bare-metal kernel/application that runs inside KVM. Even then you have to use 
various tricks to partition off a section of the host address space for use by 
qemu. It's not impossible, but it is a significant undertaking with somewhat 
unclear benefits.

Paul





Re: [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Cole Robinson
On 03/24/2010 03:59 AM, Gerd Hoffmann wrote:
 On 03/24/10 00:13, Jamie Lokier wrote:
 Gerd Hoffmann wrote:
 - networking: man, setting networking is a mess, libvirt just does it
for you.

 +1

 Even when not using libvirt for a reason or another I usually hook my
 virtual machines into virbr0 (libvirt default network).

 I had the opposite problem.  Needed to use multiple bridges and have
 some VMs behind NAT without a bridge (private IPs), and some using
 separately firewalled bridges (needed to behave like real attached
 hardware with their original MACs, but be firewalled).
 
 No problem in theory.  libvirt should detect existing bridges and allow 
 you to attach virtual machines to them.  So you can setup bridges and 
 firewalling for them using usual distro tools and use them for virtual 
 machines.
 
 In practice I've seen this not working correctly in the past, i.e. my 
 br0 didn't pop up in the virt-manager nic setup page.
 

Please file a bug: virt-manager has had bridge detection for years, so
something must be going wrong. In f13 we will use netcf for this, so
even bridge enumeration on remote hosts should work.

Additionally I recently made a change upstream to allow users to
manually enter a bridge name, since netcf isn't supported for all
distros yet.

- Cole




Re: [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Gerd Hoffmann

In practice I've seen this not working correctly in the past, i.e. my

^^^

br0 didn't pop up in the virt-manager nic setup page.



Please file a bug: virt-manager has had bridge detection for years, so
something must be going wrong.


Works for me now ;)


In f13 we will use netcf for this, so
even bridge enumeration on remote hosts should work.


Yes, remote connections.  F12 + virt-preview works.

cheers,
  Gerd





[Qemu-devel] [PATCH] qemu: jaso-parser: Output the content of invalid keyword

2010-03-24 Thread Amos Kong

When input some invialid words in QMP port, qemu outputs this error message:
parse error: invalid keyword `%s'
This patch makes qemu output the content, like:
parse error: invalid keyword `unknow_cmd'

Signed-off-by: Amos Kong ak...@redhat.com
---
 json-parser.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/json-parser.c b/json-parser.c
index 579928f..98a82af 100644
--- a/json-parser.c
+++ b/json-parser.c
@@ -12,6 +12,7 @@
  */
 
 #include stdbool.h
+#include stdarg.h
 
 #include qemu-common.h
 #include qstring.h
@@ -93,7 +94,11 @@ static int token_is_escape(QObject *obj, const char *value)
  */
 static void parse_error(JSONParserContext *ctxt, QObject *token, const char 
*msg, ...)
 {
-fprintf(stderr, parse error: %s\n, msg);
+va_list ap;
+va_start(ap, msg);
+fprintf(stderr, parse error: );
+vfprintf(stderr, msg, ap);
+fprintf(stderr, \n);
 }
 
 /**
-- 
1.5.5.6





[Qemu-devel] [PATCH] update bochs vbe interface

2010-03-24 Thread Gerd Hoffmann
The bochs vbe interface got a new register a while back, which specifies
the linear framebuffer size in 64k units.  This patch adds support for
the new register to qemu.  With this patch applied vgabios 0.6c works
with qemu.

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 hw/vga.c |3 ++-
 hw/vga_int.h |4 +++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/vga.c b/hw/vga.c
index 6a1a059..a5c6997 100644
--- a/hw/vga.c
+++ b/hw/vga.c
@@ -1955,7 +1955,8 @@ void vga_common_reset(VGACommonState *s)
 #ifdef CONFIG_BOCHS_VBE
 s-vbe_index = 0;
 memset(s-vbe_regs, '\0', sizeof(s-vbe_regs));
-s-vbe_regs[VBE_DISPI_INDEX_ID] = VBE_DISPI_ID0;
+s-vbe_regs[VBE_DISPI_INDEX_ID] = VBE_DISPI_ID5;
+s-vbe_regs[VBE_DISPI_INDEX_VIDEO_MEMORY_64K] = s-vram_size / (64 * 1024);
 s-vbe_start_addr = 0;
 s-vbe_line_offset = 0;
 s-vbe_bank_mask = (s-vram_size  16) - 1;
diff --git a/hw/vga_int.h b/hw/vga_int.h
index 23a42ef..4b8fc74 100644
--- a/hw/vga_int.h
+++ b/hw/vga_int.h
@@ -47,13 +47,15 @@
 #define VBE_DISPI_INDEX_VIRT_HEIGHT 0x7
 #define VBE_DISPI_INDEX_X_OFFSET0x8
 #define VBE_DISPI_INDEX_Y_OFFSET0x9
-#define VBE_DISPI_INDEX_NB  0xa
+#define VBE_DISPI_INDEX_VIDEO_MEMORY_64K 0xa
+#define VBE_DISPI_INDEX_NB  0xb
 
 #define VBE_DISPI_ID0   0xB0C0
 #define VBE_DISPI_ID1   0xB0C1
 #define VBE_DISPI_ID2   0xB0C2
 #define VBE_DISPI_ID3   0xB0C3
 #define VBE_DISPI_ID4   0xB0C4
+#define VBE_DISPI_ID5   0xB0C5
 
 #define VBE_DISPI_DISABLED  0x00
 #define VBE_DISPI_ENABLED   0x01
-- 
1.6.6.1





Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Paolo Bonzini

On 03/24/2010 12:19 PM, Richard Henderson wrote:

On 03/24/2010 02:47 AM, Paolo Bonzini wrote:

1) make CPUState define only common fields. Include CPUState at the
beginning of each per-target CPUXYZState.


Irritatingly, the common fields contain quite big TLBs. And the
offsets from the start of env affect the compactness of the code
generated from TCG. We really really want the general registers
to come first to make sure that those offsets fit the host's
reg+offset addressing mode.


What about adding a 512-bytes (or more) block or something like that at 
the beginning of CPUState with a union, so you can put the per-target 
stuff there?


Paolo




RE: [Qemu-devel] Guest memory mapping in Qemu

2010-03-24 Thread Michael T

 I was wondering whether something in-between would also
 be feasible. That is, chunks of guest address space (say 4MB chunks for
 the sake of the argument) are mmapped into the address space of the Qemu
 process on the host, and when an access to guest memory is made, there is
 an initial check to see whether it is in the same chunk as the last one,
 in which case all the MMU emulation bits could be saved. I could imagine
 Qemu keeping a current/most recent chunk for each register which can be
 used for relative addressing, plus one for non-register-relative accesses.
 It seems to me that this could potentially speed up memory access quite a
 bit, and as a bonus even make it easy to support x86 segmentation (as part
 of the bounds check for whether a memory access is in a chunk).

 This is effectively shadow paging implemented in userspace via mmap. It's very
 hard to make it work in a sane way, and even harder to make it go fast.
Yes, I can imagine that.

 TLB
 handling is already a significant bottleneck for many tasks, adding a mmap
 call is likely to make this orders of magnitude worse.
That might still depend on how much old mappings get reused and how often it
would be necessary to create new ones.  I am tempted to do a bit of profiling of
the memory usage patterns of a few guests to make an estimate.  Does Qemu
have any built-in statistics code that could be useful for this?
[snip]
 If you really want to do shadow paging for cross environments, you probably
 need to move it into kernel space.
That isn't as interesting, as there are already people doing that sort of thing.
The attractive thing about Qemu's emulation mode is that is is pure userspace!

Thanks for commenting.

Michael
  
_
Hotmail: Free, trusted and rich email service.
https://signup.live.com/signup.aspx?id=60969



[Qemu-devel] [PATCH 01/15] virtio-serial: save/load: Ensure target has enough ports

2010-03-24 Thread Amit Shah
The target could be started with max_nr_ports for a virtio-serial device
lesser than what was available on the source machine. Fail the migration
in such a case.

Signed-off-by: Amit Shah amit.s...@redhat.com
Reported-by: Juan Quintela quint...@redhat.com
---
 hw/virtio-serial-bus.c |   13 +++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 17c1ec1..9a7f0c1 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -374,10 +374,13 @@ static void virtio_serial_save(QEMUFile *f, void *opaque)
 
 /* Items in struct VirtIOSerial */
 
+qemu_put_be32s(f, s-bus-max_nr_ports);
+
 /* Do this because we might have hot-unplugged some ports */
 nr_active_ports = 0;
-QTAILQ_FOREACH(port, s-ports, next)
+QTAILQ_FOREACH(port, s-ports, next) {
 nr_active_ports++;
+}
 
 qemu_put_be32s(f, nr_active_ports);
 
@@ -399,7 +402,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 {
 VirtIOSerial *s = opaque;
 VirtIOSerialPort *port;
-uint32_t nr_active_ports;
+uint32_t max_nr_ports, nr_active_ports;
 unsigned int i;
 
 if (version_id  2) {
@@ -420,6 +423,12 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 
 /* Items in struct VirtIOSerial */
 
+qemu_get_be32s(f, max_nr_ports);
+if (max_nr_ports  s-bus-max_nr_ports) {
+/* Source could have more ports than us. Fail migration. */
+return -EINVAL;
+}
+
 qemu_get_be32s(f, nr_active_ports);
 
 /* Items in struct VirtIOSerialPort */
-- 
1.6.2.5





[Qemu-devel] [PATCH 02/15] virtio-serial: save/load: Ensure nr_ports on src and dest are same.

2010-03-24 Thread Amit Shah
The number of ports on the source as well as the destination machines
should match. If they don't, it means some ports that got hotplugged on
the source aren't instantiated on the destination. Or that ports that
were hot-unplugged on the source are created on the destination.

Signed-off-by: Amit Shah amit.s...@redhat.com
Reported-by: Juan Quintela quint...@redhat.com
---
 hw/virtio-serial-bus.c |   18 --
 1 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 9a7f0c1..d31e62d 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -402,7 +402,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 {
 VirtIOSerial *s = opaque;
 VirtIOSerialPort *port;
-uint32_t max_nr_ports, nr_active_ports;
+uint32_t max_nr_ports, nr_active_ports, nr_ports;
 unsigned int i;
 
 if (version_id  2) {
@@ -419,7 +419,21 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 /* The config space */
 qemu_get_be16s(f, s-config.cols);
 qemu_get_be16s(f, s-config.rows);
-s-config.nr_ports = qemu_get_be32(f);
+nr_ports = qemu_get_be32(f);
+
+if (nr_ports != s-config.nr_ports) {
+/*
+ * Source hot-plugged/unplugged ports and we don't have all of
+ * them here.
+ *
+ * Note: This condition cannot check for all hotplug/unplug
+ * events: eg, if one port was hot-plugged and one was
+ * unplugged, the nr_ports remains the same but the port id's
+ * would have changed and we won't catch it here. A later
+ * check for !find_port_by_id() will confirm if this happened.
+ */
+return -EINVAL;
+}
 
 /* Items in struct VirtIOSerial */
 
-- 
1.6.2.5





[Qemu-devel] [PATCH 04/15] virtio-serial: save/load: Send target host connection status if different

2010-03-24 Thread Amit Shah
If the host connection to a port is closed on the destination machine
after migration, whereas the connection was open on the source, the
guest has to be informed of that.

Similar for a host connection open on the destination.

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/virtio-serial-bus.c |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 5316ef6..484dc94 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -395,6 +395,7 @@ static void virtio_serial_save(QEMUFile *f, void *opaque)
  */
 qemu_put_be32s(f, port-id);
 qemu_put_byte(f, port-guest_connected);
+qemu_put_byte(f, port-host_connected);
 }
 }
 
@@ -448,6 +449,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 /* Items in struct VirtIOSerialPort */
 for (i = 0; i  nr_active_ports; i++) {
 uint32_t id;
+bool host_connected;
 
 id = qemu_get_be32(f);
 port = find_port_by_id(s, id);
@@ -460,6 +462,15 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 }
 
 port-guest_connected = qemu_get_byte(f);
+host_connected = qemu_get_byte(f);
+if (host_connected != port-host_connected) {
+/*
+ * We have to let the guest know of the host connection
+ * status change
+ */
+send_control_event(port, VIRTIO_CONSOLE_PORT_OPEN,
+   port-host_connected);
+}
 }
 
 return 0;
-- 
1.6.2.5





[Qemu-devel] [PATCH 05/15] virtio-serial: Use control messages to notify guest of new ports

2010-03-24 Thread Amit Shah
Allow the port 'id's to be set by a user on the command line. This is
needed by management apps that will want a stable port numbering scheme
for hot-plug/unplug and migration.

Since the port numbers are shared with the guest (to identify ports in
control messages), we just send a control message to the guest
indicating addition of new ports (hot-plug) or notifying the guest of
the available ports when the guest sends us a DEVICE_READY control
message.

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/virtio-console.c|2 +
 hw/virtio-serial-bus.c |  181 +++-
 hw/virtio-serial.h |   17 +++--
 3 files changed, 130 insertions(+), 70 deletions(-)

diff --git a/hw/virtio-console.c b/hw/virtio-console.c
index bd44ec6..17b221d 100644
--- a/hw/virtio-console.c
+++ b/hw/virtio-console.c
@@ -99,6 +99,7 @@ static VirtIOSerialPortInfo virtconsole_info = {
 .exit  = virtconsole_exitfn,
 .qdev.props = (Property[]) {
 DEFINE_PROP_UINT8(is_console, VirtConsole, port.is_console, 1),
+DEFINE_PROP_UINT32(nr, VirtConsole, port.id, VIRTIO_CONSOLE_BAD_ID),
 DEFINE_PROP_CHR(chardev, VirtConsole, chr),
 DEFINE_PROP_STRING(name, VirtConsole, port.name),
 DEFINE_PROP_END_OF_LIST(),
@@ -133,6 +134,7 @@ static VirtIOSerialPortInfo virtserialport_info = {
 .init  = virtserialport_initfn,
 .exit  = virtconsole_exitfn,
 .qdev.props = (Property[]) {
+DEFINE_PROP_UINT32(nr, VirtConsole, port.id, VIRTIO_CONSOLE_BAD_ID),
 DEFINE_PROP_CHR(chardev, VirtConsole, chr),
 DEFINE_PROP_STRING(name, VirtConsole, port.name),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 484dc94..00e8616 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -41,6 +41,10 @@ struct VirtIOSerial {
 VirtIOSerialBus *bus;
 
 QTAILQ_HEAD(, VirtIOSerialPort) ports;
+
+/* bitmap for identifying active ports */
+uint32_t *ports_map;
+
 struct virtio_console_config config;
 };
 
@@ -48,6 +52,10 @@ static VirtIOSerialPort *find_port_by_id(VirtIOSerial *vser, 
uint32_t id)
 {
 VirtIOSerialPort *port;
 
+if (id == VIRTIO_CONSOLE_BAD_ID) {
+return NULL;
+}
+
 QTAILQ_FOREACH(port, vser-ports, next) {
 if (port-id == id)
 return port;
@@ -208,14 +216,25 @@ static void handle_control_message(VirtIOSerial *vser, 
void *buf)
 size_t buffer_len;
 
 gcpkt = buf;
-port = find_port_by_id(vser, ldl_p(gcpkt-id));
-if (!port)
-return;
 
 cpkt.event = lduw_p(gcpkt-event);
 cpkt.value = lduw_p(gcpkt-value);
 
+port = find_port_by_id(vser, ldl_p(gcpkt-id));
+if (!port  cpkt.event != VIRTIO_CONSOLE_DEVICE_READY)
+return;
+
 switch(cpkt.event) {
+case VIRTIO_CONSOLE_DEVICE_READY:
+/*
+ * The device is up, we can now tell the device about all the
+ * ports we have here.
+ */
+QTAILQ_FOREACH(port, vser-ports, next) {
+send_control_event(port, VIRTIO_CONSOLE_PORT_ADD, 1);
+}
+break;
+
 case VIRTIO_CONSOLE_PORT_READY:
 /*
  * Now that we know the guest asked for the port name, we're
@@ -370,13 +389,16 @@ static void virtio_serial_save(QEMUFile *f, void *opaque)
 /* The config space */
 qemu_put_be16s(f, s-config.cols);
 qemu_put_be16s(f, s-config.rows);
-qemu_put_be32s(f, s-config.nr_ports);
 
-/* Items in struct VirtIOSerial */
+qemu_put_be32s(f, s-config.max_nr_ports);
+
+/* The ports map */
 
-qemu_put_be32s(f, s-bus-max_nr_ports);
+qemu_put_buffer(f, (uint8_t *)s-ports_map,
+sizeof(uint32_t) * (s-config.max_nr_ports + 31) / 32);
+
+/* Ports */
 
-/* Do this because we might have hot-unplugged some ports */
 nr_active_ports = 0;
 QTAILQ_FOREACH(port, s-ports, next) {
 nr_active_ports++;
@@ -388,11 +410,6 @@ static void virtio_serial_save(QEMUFile *f, void *opaque)
  * Items in struct VirtIOSerialPort.
  */
 QTAILQ_FOREACH(port, s-ports, next) {
-/*
- * We put the port number because we may not have an active
- * port at id 0 that's reserved for a console port, or in case
- * of ports that might have gotten unplugged
- */
 qemu_put_be32s(f, port-id);
 qemu_put_byte(f, port-guest_connected);
 qemu_put_byte(f, port-host_connected);
@@ -403,7 +420,8 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 {
 VirtIOSerial *s = opaque;
 VirtIOSerialPort *port;
-uint32_t max_nr_ports, nr_active_ports, nr_ports;
+size_t ports_map_size;
+uint32_t max_nr_ports, nr_active_ports, *ports_map;
 unsigned int i;
 
 if (version_id  2) {
@@ -420,29 +438,28 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 /* The config space */
 

[Qemu-devel] [PATCH 06/15] virtio-serial: whitespace: match surrounding code

2010-03-24 Thread Amit Shah
The virtio-serial code doesn't mix declarations and definitions, so
separate them out on different lines.

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/virtio-serial-bus.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 00e8616..80f0259 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -354,7 +354,10 @@ static void handle_input(VirtIODevice *vdev, VirtQueue *vq)
 
 static uint32_t get_features(VirtIODevice *vdev, uint32_t features)
 {
-VirtIOSerial *vser = DO_UPCAST(VirtIOSerial, vdev, vdev);
+VirtIOSerial *vser;
+
+vser = DO_UPCAST(VirtIOSerial, vdev, vdev);
+
 if (vser-bus-max_nr_ports  1) {
 features |= (1  VIRTIO_CONSOLE_F_MULTIPORT);
 }
-- 
1.6.2.5





[Qemu-devel] [PATCH 08/15] virtio-serial: Update copyright year to 2010

2010-03-24 Thread Amit Shah
Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/virtio-console.c|2 +-
 hw/virtio-serial-bus.c |2 +-
 hw/virtio-serial.h |2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/virtio-console.c b/hw/virtio-console.c
index 17b221d..6b8 100644
--- a/hw/virtio-console.c
+++ b/hw/virtio-console.c
@@ -1,7 +1,7 @@
 /*
  * Virtio Console and Generic Serial Port Devices
  *
- * Copyright Red Hat, Inc. 2009
+ * Copyright Red Hat, Inc. 2009, 2010
  *
  * Authors:
  *  Amit Shah amit.s...@redhat.com
diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 4435c62..a19e751 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -1,7 +1,7 @@
 /*
  * A bus for connecting virtio serial and console ports
  *
- * Copyright (C) 2009 Red Hat, Inc.
+ * Copyright (C) 2009, 2010 Red Hat, Inc.
  *
  * Author(s):
  *  Amit Shah amit.s...@redhat.com
diff --git a/hw/virtio-serial.h b/hw/virtio-serial.h
index 0548689..f023873 100644
--- a/hw/virtio-serial.h
+++ b/hw/virtio-serial.h
@@ -2,7 +2,7 @@
  * Virtio Serial / Console Support
  *
  * Copyright IBM, Corp. 2008
- * Copyright Red Hat, Inc. 2009
+ * Copyright Red Hat, Inc. 2009, 2010
  *
  * Authors:
  *  Christian Ehrhardt ehrha...@linux.vnet.ibm.com
-- 
1.6.2.5





[Qemu-devel] [PATCH 09/15] virtio-serial: Propagate errors in initialising ports / devices in guest

2010-03-24 Thread Amit Shah
If adding of ports or devices in the guest fails we can send out a QMP
event so that management software can deal with it.

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/virtio-serial-bus.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index a19e751..33083af 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -223,6 +223,11 @@ static void handle_control_message(VirtIOSerial *vser, 
void *buf)
 
 switch(cpkt.event) {
 case VIRTIO_CONSOLE_DEVICE_READY:
+if (!cpkt.value) {
+error_report(virtio-serial-bus: Guest failure in adding device 
%s\n,
+vser-bus-qbus.name);
+break;
+}
 /*
  * The device is up, we can now tell the device about all the
  * ports we have here.
@@ -233,6 +238,11 @@ static void handle_control_message(VirtIOSerial *vser, 
void *buf)
 break;
 
 case VIRTIO_CONSOLE_PORT_READY:
+if (!cpkt.value) {
+error_report(virtio-serial-bus: Guest failure in adding port %u 
for device %s\n,
+ port-id, vser-bus-qbus.name);
+break;
+}
 /*
  * Now that we know the guest asked for the port name, we're
  * sure the guest has initialised whatever state is necessary
-- 
1.6.2.5





[Qemu-devel] [PATCH 07/15] virtio-serial: Remove redundant check for 0-sized write request

2010-03-24 Thread Amit Shah
The check for a 0-sized write request to a guest port is not necessary;
the while loop below won't be executed in this case and all will be
fine.

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/virtio-serial-bus.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 80f0259..4435c62 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -91,9 +91,6 @@ static size_t write_to_port(VirtIOSerialPort *port,
 if (!virtio_queue_ready(vq)) {
 return 0;
 }
-if (!size) {
-return 0;
-}
 
 while (offset  size) {
 int i;
-- 
1.6.2.5





[Qemu-devel] [PATCH 10/15] virtio-serial: Add QMP events for failed port/device add

2010-03-24 Thread Amit Shah
When adding a port or a device to the guest fails, management software
might be interested in knowing and then cleaning up the host-side of the
port. Introduce QMP events to signal such errors.

Signed-off-by: Amit Shah amit.s...@redhat.com
CC: Luiz Capitulino lcapitul...@redhat.com
---
 QMP/qmp-events.txt |   48 
 hw/virtio-serial-bus.c |   15 +++
 monitor.c  |3 +++
 monitor.h  |1 +
 4 files changed, 67 insertions(+), 0 deletions(-)

diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
index a94e9b4..f13cf45 100644
--- a/QMP/qmp-events.txt
+++ b/QMP/qmp-events.txt
@@ -188,3 +188,51 @@ Example:
 
 Note: If action is reset, shutdown, or pause the WATCHDOG event is
 followed respectively by the RESET, SHUTDOWN, or STOP events.
+
+VIRTIO_SERIAL
+-
+
+Emitted when errors occur in guest port add or guest device add.
+
+Data:
+
+- device: The virtio-serial device that triggered the event {json-string}
+  This is the name given to the bus on the command line:
+-device virtio-serial,id=foo
+  here, the device name is foo
+
+- port: The port number that triggered the event {json-number}
+  This is the number given to the port on the command line:
+-device virtserialport,nr=2
+  here, the port number is 2. If not mentioned on the command
+  line, the number is auto-assigned.
+
+- result: The result of the operation {json-string}
+  This is one of the following:
+ pass, fail
+
+- operation: The operation that triggered the event {json-sring}
+  This is one of the following:
+ port_add, device_add
+
+Example:
+
+Port 0 add failure in the guest:
+
+{ timestamp: {seconds: 1269438649, microseconds: 851170},
+  event: VIRTIO_SERIAL,
+  data: {
+device: virtio-serial-bus.0,
+port: 0,
+result: fail,
+operation: port_add } }
+
+Device add failure in the guest:
+
+{ timestamp: {seconds: 1269438702, microseconds: 78114},
+  event: VIRTIO_SERIAL,
+  data: {
+device: virtio-serial-bus.0,
+port: 4294967295,
+result: fail,
+operation: device_add } }
diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 33083af..efcc66c 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -16,6 +16,7 @@
  */
 
 #include monitor.h
+#include qemu-objects.h
 #include qemu-queue.h
 #include sysbus.h
 #include virtio-serial.h
@@ -204,6 +205,17 @@ size_t virtio_serial_guest_ready(VirtIOSerialPort *port)
 return 0;
 }
 
+static void mon_event(const char *device, const uint32_t port_id,
+  const char *operation, const char *result)
+{
+QObject *data;
+
+data = qobject_from_jsonf({ 'device': %s, 'port': %ld, 'operation': %s, 
'result': %s },
+  device, (int64_t)port_id, operation, result);
+monitor_protocol_event(QEVENT_VIRTIO_SERIAL, data);
+qobject_decref(data);
+}
+
 /* Guest wants to notify us of some event */
 static void handle_control_message(VirtIOSerial *vser, void *buf)
 {
@@ -226,6 +238,8 @@ static void handle_control_message(VirtIOSerial *vser, void 
*buf)
 if (!cpkt.value) {
 error_report(virtio-serial-bus: Guest failure in adding device 
%s\n,
 vser-bus-qbus.name);
+mon_event(vser-bus-qbus.name, VIRTIO_CONSOLE_BAD_ID,
+ device_add, fail);
 break;
 }
 /*
@@ -241,6 +255,7 @@ static void handle_control_message(VirtIOSerial *vser, void 
*buf)
 if (!cpkt.value) {
 error_report(virtio-serial-bus: Guest failure in adding port %u 
for device %s\n,
  port-id, vser-bus-qbus.name);
+mon_event(vser-bus-qbus.name, port-id, port_add, fail);
 break;
 }
 /*
diff --git a/monitor.c b/monitor.c
index 0448a70..9e5bfe7 100644
--- a/monitor.c
+++ b/monitor.c
@@ -441,6 +441,9 @@ void monitor_protocol_event(MonitorEvent event, QObject 
*data)
 case QEVENT_WATCHDOG:
 event_name = WATCHDOG;
 break;
+case QEVENT_VIRTIO_SERIAL:
+event_name = VIRTIO_SERIAL;
+break;
 default:
 abort();
 break;
diff --git a/monitor.h b/monitor.h
index bd4ae34..ea4df8a 100644
--- a/monitor.h
+++ b/monitor.h
@@ -28,6 +28,7 @@ typedef enum MonitorEvent {
 QEVENT_BLOCK_IO_ERROR,
 QEVENT_RTC_CHANGE,
 QEVENT_WATCHDOG,
+QEVENT_VIRTIO_SERIAL,
 QEVENT_MAX,
 } MonitorEvent;
 
-- 
1.6.2.5





[Qemu-devel] [PATCH 11/15] virtio-serial: Send out guest data to ports only if port is opened

2010-03-24 Thread Amit Shah
Data should be written only when ports are open.

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/virtio-serial-bus.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index efcc66c..80fbff4 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -350,6 +350,11 @@ static void handle_output(VirtIODevice *vdev, VirtQueue 
*vq)
 goto next_buf;
 }
 
+   if (!port-host_connected) {
+ret = 0;
+goto next_buf;
+}
+
 /*
  * A port may not have any handler registered for consuming the
  * data that the guest sends or it may not have a chardev associated
-- 
1.6.2.5





[Qemu-devel] [PATCH 12/15] iov: Introduce a new file for helpers around iovs, add iov_from_buf()

2010-03-24 Thread Amit Shah
The virtio-net code uses iov_fill() which fills an iov from a linear
buffer. The virtio-serial-bus code does something similar in an
open-coded function.

Create a new iov.c file that has iov_from_buf().

Convert virtio-net and virtio-serial-bus over to use this functionality.
virtio-net used ints to hold sizes, the new function is going to use
size_t types.

Later commits will add the opposite functionality -- going from an iov
to a linear buffer.

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 Makefile.objs  |1 +
 hw/iov.c   |   33 +
 hw/iov.h   |   16 
 hw/virtio-net.c|   20 +++-
 hw/virtio-serial-bus.c |   15 +++
 5 files changed, 60 insertions(+), 25 deletions(-)
 create mode 100644 hw/iov.c
 create mode 100644 hw/iov.h

diff --git a/Makefile.objs b/Makefile.objs
index 281f7a6..212ae1d 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -127,6 +127,7 @@ user-obj-y += cutils.o cache-utils.o
 # libhw
 
 hw-obj-y =
+hw-obj-y += iov.o
 hw-obj-y += loader.o
 hw-obj-y += virtio.o virtio-console.o virtio-pci.o
 hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o
diff --git a/hw/iov.c b/hw/iov.c
new file mode 100644
index 000..07bd499
--- /dev/null
+++ b/hw/iov.c
@@ -0,0 +1,33 @@
+/*
+ * Helpers for getting linearized buffers from iov / filling buffers into iovs
+ *
+ * Copyright IBM, Corp. 2007, 2008
+ * Copyright (C) 2010 Red Hat, Inc.
+ *
+ * Author(s):
+ *  Anthony Liguori aligu...@us.ibm.com
+ *  Amit Shah amit.s...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include iov.h
+
+size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt,
+const void *buf, size_t size)
+{
+size_t offset;
+unsigned int i;
+
+offset = 0;
+for (i = 0; offset  size  i  iovcnt; i++) {
+size_t len;
+
+len = MIN(iov[i].iov_len, size - offset);
+
+memcpy(iov[i].iov_base, buf + offset, len);
+offset += len;
+}
+return offset;
+}
diff --git a/hw/iov.h b/hw/iov.h
new file mode 100644
index 000..5e3e541
--- /dev/null
+++ b/hw/iov.h
@@ -0,0 +1,16 @@
+/*
+ * Helpers for getting linearized buffers from iov / filling buffers into iovs
+ *
+ * Copyright (C) 2010 Red Hat, Inc.
+ *
+ * Author(s):
+ *  Amit Shah amit.s...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include qemu-common.h
+
+size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt,
+const void *buf, size_t size);
diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index be33c68..273e7f9 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -11,6 +11,7 @@
  *
  */
 
+#include iov.h
 #include virtio.h
 #include net.h
 #include net/checksum.h
@@ -423,21 +424,6 @@ static void work_around_broken_dhclient(struct 
virtio_net_hdr *hdr,
 }
 }
 
-static int iov_fill(struct iovec *iov, int iovcnt, const void *buf, int count)
-{
-int offset, i;
-
-offset = i = 0;
-while (offset  count  i  iovcnt) {
-int len = MIN(iov[i].iov_len, count - offset);
-memcpy(iov[i].iov_base, buf + offset, len);
-offset += len;
-i++;
-}
-
-return offset;
-}
-
 static int receive_header(VirtIONet *n, struct iovec *iov, int iovcnt,
   const void *buf, size_t size, size_t hdr_len)
 {
@@ -573,8 +559,8 @@ static ssize_t virtio_net_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 }
 
 /* copy in packet.  ugh */
-len = iov_fill(sg, elem.in_num,
-   buf + offset, size - offset);
+len = iov_from_buf(sg, elem.in_num,
+   buf + offset, size - offset);
 total += len;
 
 /* signal other side */
diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 80fbff4..bd1223e 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -15,6 +15,7 @@
  * the COPYING file in the top-level directory.
  */
 
+#include iov.h
 #include monitor.h
 #include qemu-objects.h
 #include qemu-queue.h
@@ -85,27 +86,25 @@ static size_t write_to_port(VirtIOSerialPort *port,
 {
 VirtQueueElement elem;
 VirtQueue *vq;
-size_t offset = 0;
-size_t len = 0;
+size_t offset;
 
 vq = port-ivq;
 if (!virtio_queue_ready(vq)) {
 return 0;
 }
 
+offset = 0;
 while (offset  size) {
-int i;
+size_t len;
 
 if (!virtqueue_pop(vq, elem)) {
 break;
 }
 
-for (i = 0; offset  size  i  elem.in_num; i++) {
-len = MIN(elem.in_sg[i].iov_len, size - offset);
+len = iov_from_buf(elem.in_sg, elem.in_num,
+   buf + offset, size - offset);
+offset += len;
 
-memcpy(elem.in_sg[i].iov_base, buf + offset, 

[Qemu-devel] [PATCH 13/15] iov: Add iov_to_buf and iov_size helpers

2010-03-24 Thread Amit Shah
iov_to_buf() puts the buffer contents in the iov in a linearized buffer.

iov_size() gets the length of the contents in the iov.

The iov_to_buf() function is the memcpy_to_iovec() function that was
used in virtio-ballon.c.

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 hw/iov.c|   37 +
 hw/iov.h|3 +++
 hw/virtio-balloon.c |   35 ---
 3 files changed, 44 insertions(+), 31 deletions(-)

diff --git a/hw/iov.c b/hw/iov.c
index 07bd499..d4013cd 100644
--- a/hw/iov.c
+++ b/hw/iov.c
@@ -31,3 +31,40 @@ size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt,
 }
 return offset;
 }
+
+size_t iov_to_buf(struct iovec *iov, unsigned int iovcnt,
+  void *buf, size_t offset, size_t size)
+{
+uint8_t *ptr;
+size_t iov_off, buf_off;
+unsigned int i;
+
+ptr = buf;
+iov_off = 0;
+buf_off = 0;
+for (i = 0; i  iovcnt  size; i++) {
+if (offset  (iov_off + iov[i].iov_len)) {
+size_t len = MIN((iov_off + iov[i].iov_len) - offset , size);
+
+memcpy(ptr + buf_off, iov[i].iov_base + (offset - iov_off), len);
+
+buf_off += len;
+offset += len;
+size -= len;
+}
+iov_off += iov[i].iov_len;
+}
+return buf_off;
+}
+
+size_t iov_size(struct iovec *iov, unsigned int iovcnt)
+{
+size_t len;
+unsigned int i;
+
+len = 0;
+for (i = 0; i  iovcnt; i++) {
+len += iov[i].iov_len;
+}
+return len;
+}
diff --git a/hw/iov.h b/hw/iov.h
index 5e3e541..c977ff1 100644
--- a/hw/iov.h
+++ b/hw/iov.h
@@ -14,3 +14,6 @@
 
 size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt,
 const void *buf, size_t size);
+size_t iov_to_buf(struct iovec *iov, unsigned int iovcnt,
+  void *buf, size_t offset, size_t size);
+size_t iov_size(struct iovec *iov, unsigned int iovcnt);
diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
index 6d12024..4414eae 100644
--- a/hw/virtio-balloon.c
+++ b/hw/virtio-balloon.c
@@ -11,6 +11,7 @@
  *
  */
 
+#include iov.h
 #include qemu-common.h
 #include virtio.h
 #include pc.h
@@ -91,33 +92,6 @@ static QObject *get_stats_qobject(VirtIOBalloon *dev)
 return QOBJECT(dict);
 }
 
-/* FIXME: once we do a virtio refactoring, this will get subsumed into common
- * code */
-static size_t memcpy_from_iovector(void *data, size_t offset, size_t size,
-   struct iovec *iov, int iovlen)
-{
-int i;
-uint8_t *ptr = data;
-size_t iov_off = 0;
-size_t data_off = 0;
-
-for (i = 0; i  iovlen  size; i++) {
-if (offset  (iov_off + iov[i].iov_len)) {
-size_t len = MIN((iov_off + iov[i].iov_len) - offset , size);
-
-memcpy(ptr + data_off, iov[i].iov_base + (offset - iov_off), len);
-
-data_off += len;
-offset += len;
-size -= len;
-}
-
-iov_off += iov[i].iov_len;
-}
-
-return data_off;
-}
-
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOBalloon *s = to_virtio_balloon(vdev);
@@ -127,8 +101,7 @@ static void virtio_balloon_handle_output(VirtIODevice 
*vdev, VirtQueue *vq)
 size_t offset = 0;
 uint32_t pfn;
 
-while (memcpy_from_iovector(pfn, offset, 4,
-elem.out_sg, elem.out_num) == 4) {
+while (iov_to_buf(elem.out_sg, elem.out_num, pfn, offset, 4) == 4) {
 ram_addr_t pa;
 ram_addr_t addr;
 
@@ -180,8 +153,8 @@ static void virtio_balloon_receive_stats(VirtIODevice 
*vdev, VirtQueue *vq)
  */
 reset_stats(s);
 
-while (memcpy_from_iovector(stat, offset, sizeof(stat), elem-out_sg,
-elem-out_num) == sizeof(stat)) {
+while (iov_to_buf(elem-out_sg, elem-out_num, stat, offset, sizeof(stat))
+   == sizeof(stat)) {
 uint16_t tag = tswap16(stat.tag);
 uint64_t val = tswap64(stat.val);
 
-- 
1.6.2.5





[Qemu-devel] [PATCH 14/15] virtio-serial: Handle scatter-gather buffers for control messages

2010-03-24 Thread Amit Shah
Current control messages are small enough to not be split into multiple
buffers but we could run into such a situation in the future or a
malicious guest could cause such a situation.

So handle the entire iov request for control messages.

Also ensure the size of the control request is = what we expect
otherwise we risk accessing memory that we don't own.

Signed-off-by: Amit Shah amit.s...@redhat.com
CC: Avi Kivity a...@redhat.com
Reported-by: Avi Kivity a...@redhat.com
---
 hw/virtio-serial-bus.c |   34 +++---
 1 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index bd1223e..3edfeca 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -216,7 +216,7 @@ static void mon_event(const char *device, const uint32_t 
port_id,
 }
 
 /* Guest wants to notify us of some event */
-static void handle_control_message(VirtIOSerial *vser, void *buf)
+static void handle_control_message(VirtIOSerial *vser, void *buf, size_t len)
 {
 struct VirtIOSerialPort *port;
 struct virtio_console_control cpkt, *gcpkt;
@@ -225,6 +225,11 @@ static void handle_control_message(VirtIOSerial *vser, 
void *buf)
 
 gcpkt = buf;
 
+if (len  sizeof(cpkt)) {
+/* The guest sent an invalid control packet */
+return;
+}
+
 cpkt.event = lduw_p(gcpkt-event);
 cpkt.value = lduw_p(gcpkt-value);
 
@@ -321,12 +326,35 @@ static void control_out(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtQueueElement elem;
 VirtIOSerial *vser;
+uint8_t *buf;
+size_t len;
 
 vser = DO_UPCAST(VirtIOSerial, vdev, vdev);
 
+len = 0;
+buf = NULL;
 while (virtqueue_pop(vq, elem)) {
-handle_control_message(vser, elem.out_sg[0].iov_base);
-virtqueue_push(vq, elem, elem.out_sg[0].iov_len);
+size_t cur_len, copied;
+
+cur_len = iov_size(elem.out_sg, elem.out_num);
+/*
+ * Allocate a new buf only if we didn't have one previously or
+ * if the size of the buf differs
+ */
+if (cur_len != len) {
+if (len) {
+qemu_free(buf);
+}
+buf = qemu_malloc(cur_len);
+len = cur_len;
+}
+copied = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, len);
+
+handle_control_message(vser, buf, copied);
+virtqueue_push(vq, elem, copied);
+}
+if (len) {
+qemu_free(buf);
 }
 virtio_notify(vdev, vq);
 }
-- 
1.6.2.5





[Qemu-devel] [PATCH 15/15] virtio-serial: Handle scatter/gather input from the guest

2010-03-24 Thread Amit Shah
Current guests don't send more than one iov but it can change later.
Ensure we handle that case.

Signed-off-by: Amit Shah amit.s...@redhat.com
CC: Avi Kivity a...@redhat.com
---
 hw/virtio-serial-bus.c |   12 
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 3edfeca..42e4ed0 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -369,7 +369,8 @@ static void handle_output(VirtIODevice *vdev, VirtQueue *vq)
 
 while (virtqueue_pop(vq, elem)) {
 VirtIOSerialPort *port;
-size_t ret;
+uint8_t *buf;
+size_t ret, buf_size;
 
 port = find_port_by_vq(vser, vq);
 if (!port) {
@@ -392,9 +393,12 @@ static void handle_output(VirtIODevice *vdev, VirtQueue 
*vq)
 goto next_buf;
 }
 
-/* The guest always sends only one sg */
-ret = port-info-have_data(port, elem.out_sg[0].iov_base,
-elem.out_sg[0].iov_len);
+buf_size = iov_size(elem.out_sg, elem.out_num);
+buf = qemu_malloc(buf_size);
+ret = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, buf_size);
+
+ret = port-info-have_data(port, buf, ret);
+qemu_free(buf);
 
 next_buf:
 virtqueue_push(vq, elem, ret);
-- 
1.6.2.5





Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Markus Armbruster
Paul Brook p...@codesourcery.com writes:

  IMO the no_user flag is a bug, and should not exist.
 
 Sorry, what's that?

 Usually an indication that a device has been incorrectly or inproperly 
 converted to the qdev interface.

Can also be an indication that the device can't support multiple
instances.  For instance:

commit 39a51dfda835a75c0ebbfd92705b96e4de77f795
Author: Markus Armbruster arm...@redhat.com
Date:   Tue Oct 27 13:52:13 2009 +0100

qdev: Tag isa-fdc, PIIX3 IDE and PIIX4 IDE as no-user

These devices are created automatically, and attempting to create
another one with -device fails with qemu: hardware error:
register_ioport_write: invalid opaque.

Signed-off-by: Markus Armbruster arm...@redhat.com
Signed-off-by: Anthony Liguori aligu...@us.ibm.com

With no-user, we at least fail with a decent error message.

I don't think it's fair to demand that a qdev conversion must relax
restrictions that haven't otherwise bothered us to be correct and
proper.  We'll relax them if and when they bother us enough to make
somebody send a decent patch.

And yes, there are better ways to disallow multiple instances of a
device than declaring it no-user.  Patches welcome.




Re: [Qemu-devel] [PATCH] qemu: jaso-parser: Output the content of invalid keyword

2010-03-24 Thread Markus Armbruster
Amos Kong ak...@redhat.com writes:

 When input some invialid word 'unknowcmd' through QMP port, qemu outputs this
 error message:
 parse error: invalid keyword `%s'
 This patch makes qemu output the content of invalid keyword, like:
 parse error: invalid keyword `unknowcmd'

 Signed-off-by: Amos Kong ak...@redhat.com

Looks good to me.

Hint: it's best to put a version in the subject when you respin, like
[PATCH v2] ...




Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Paul Brook
 Paul Brook p...@codesourcery.com writes:
   IMO the no_user flag is a bug, and should not exist.
 
  Sorry, what's that?
 
  Usually an indication that a device has been incorrectly or inproperly
  converted to the qdev interface.
 
 Can also be an indication that the device can't support multiple
 instances.  For instance:
qdev: Tag isa-fdc, PIIX3 IDE and PIIX4 IDE as no-user

I still claim this is a bug, and see no good reason why we shouldn't support 
multiple floppy controllers, ISA busses, etc.

Paul




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Paolo Bonzini

On 03/24/2010 03:56 PM, Paul Brook wrote:

On 03/24/2010 12:19 PM, Richard Henderson wrote:

On 03/24/2010 02:47 AM, Paolo Bonzini wrote:

1) make CPUState define only common fields. Include CPUState at the
beginning of each per-target CPUXYZState.


Irritatingly, the common fields contain quite big TLBs. And the
offsets from the start of env affect the compactness of the code
generated from TCG. We really really want the general registers
to come first to make sure that those offsets fit the host's
reg+offset addressing mode.


What about adding a 512-bytes (or more) block or something like that at
the beginning of CPUState with a union, so you can put the per-target
stuff there?


Is it really worth the hassle? Anything touching CPUState is probably going to
be CPU specific anyway.


qemu-timer.c, hw/dma.c is not and these are the first two files I looked 
at.  translate-all.c is the third, and it is except for a trivial cleanup.


Big parts of vl.c are independent too.

As a quick check:

$ git grep -l 'CPUState' | grep -ve ^tcg -e ^target- | wc -l
96
$ git grep -l 'CPUState' | grep -ve ^tcg -e ^target- | \
 xargs grep -l '#if.*TARGET_' | wc -l
36

The ones that remain are pretty much what would you expect, besides 
translate-all.c and some in hw/ which I snipped:


bsd-user/main.c
darwin-user/main.c darwin-user/qemu.h darwin-user/signal.c
linux-user/elfload.c linux-user/main.c linux-user/qemu.h
linux-user/signal.c linux-user/syscall.c

cpu-all.h cpu-defs.h cpu-exec.c
def-helper.h
disas.c
exec-all.h exec.c
gdbstub.c
monitor.c
translate-all.c
vl.c

Of course this doesn't mean that 60 files are target-independent, but 
30-ish probably are or can be made so.


It would also help code clarity to use CPUXYZState more, to understand 
which hw/ files are specific to one model.  For hw/s390-virtio.c that's 
obvious, but slightly less so for hw/sun4m.c and even less so for 
hw/syborg.c.  This is an independent cleanup.


Paolo




Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Luiz Capitulino
On Wed, 24 Mar 2010 12:42:16 +0200
Avi Kivity a...@redhat.com wrote:

 So, at best qemud is a toy for people who are annoyed by libvirt.

 Is the reason for doing this in qemu because libvirt is annoying? I don't see
how adding yet another layer/daemon is going to improve ours and user's life
(the same applies for libqemu).

 If I got it right, there were two complaints from the kvm-devel flamewar:

1. Qemu has usability problems
2. There's no way an external tool can get /proc/kallsyms info from Qemu

 I don't see how libqemu can help with 1) and having qemud doesn't seem
the best solution for 2) either.

 Still talking about 2), what's wrong in getting the PID or having a QMP
connection in a well known location as suggested by Anthony?





Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Richard Henderson
On 03/24/2010 07:45 AM, Paolo Bonzini wrote:
 On 03/24/2010 12:19 PM, Richard Henderson wrote:
 On 03/24/2010 02:47 AM, Paolo Bonzini wrote:
 1) make CPUState define only common fields. Include CPUState at the
 beginning of each per-target CPUXYZState.

 Irritatingly, the common fields contain quite big TLBs. And the
 offsets from the start of env affect the compactness of the code
 generated from TCG. We really really want the general registers
 to come first to make sure that those offsets fit the host's
 reg+offset addressing mode.
 
 What about adding a 512-bytes (or more) block or something like that at
 the beginning of CPUState with a union, so you can put the per-target
 stuff there?

I think that would be confusing.

What might be just as good (although possibly just as confusing)
is to move the big members into a different structure.  E.g.

struct CPUSmallCommonState
{
// most of the stuff from CPU_COMMON.
// sorted for some thought of padding elimination.  ;-)
};

struct CPULargeCommonState
{
CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE];
target_phys_addr_t iotlb[NB_MMU_MODES][CPU_TLB_SIZE];
struct TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];
jmp_buf jmp_env;
};

struct CPUXYZSmallState
{
CPUSmallCommonState common_s;
// the rest of the cpu-specific stuff.
};

struct CPUXYZLargeState
{
CPUXYZSmallState s;
CPUBigCommonState common_l;
};

extern int cpu_large_state_offset = offsetof(CPUXYZLargeState, common_l);

Now.  If you're compiling a file for which cpu-specific code is ok:

register CPUXYZLargeState *env __asm__(AREG0);
#define ENV_SMALL_COMMON_STATE(env-s.common_s)
#define ENV_LARGE_COMMON_STATE(env-common_l)

If you're compiling a file which is supposed to be independant of cpu:

register CPUSmallCommonState *env __asm__(AREG0);
#define ENV_SMALL_COMMON_STATE(env)
#define ENV_LARGE_COMMON_STATE((CPULargeCommonState *)((char *)env + 
cpu_large_state_offset))

For the gcc-compiled code, the addition of the cpu_large_state_offset
is probably more or less on par in efficiency with indirection.  But
for TCG generated code, the variable read happens at code generation
time, which means we *still* have a constant in the generated code.


r~




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Paul Brook
  1) make CPUState define only common fields. Include CPUState at the
  beginning of each per-target CPUXYZState.
 
  Irritatingly, the common fields contain quite big TLBs. And the
  offsets from the start of env affect the compactness of the code
  generated from TCG. We really really want the general registers
  to come first to make sure that those offsets fit the host's
  reg+offset addressing mode.
 
  What about adding a 512-bytes (or more) block or something like that at
  the beginning of CPUState with a union, so you can put the per-target
  stuff there?
 
  Is it really worth the hassle? Anything touching CPUState is probably
  going to be CPU specific anyway.
 
 qemu-timer.c, hw/dma.c is not and these are the first two files I looked
 at.  translate-all.c is the third, and it is except for a trivial cleanup.

The use in hw/dma.c is incorrect.  See previous discussion about how 
qemu_bh_schedule_idle needs to go away.

I'm also unconvinced by your numbers. My i386-softmmu/ directory contains only 
43 object files, most of are device emulation and don't touch CPU state at 
all. arm-softmmu/ contains a good number more, but that's mostly board init 
(which needs to know which CPU it's creating), and devices that are only used 
by one board so noone's bothered to move them into libhw.

Paul




[Qemu-devel] [PATCH 0/3] Fix S390x target

2010-03-24 Thread Alexander Graf
We're in a bad situation with the S390 qemu target. It only works with KVM,
so people can't test it when they don't have access to a real S390 machine.

While trying to build and use s390x-softmmu again I stumbled across a couple
of issues, all addressed in this patch set. The patches should all not affect
non-s390 targets at all.

Alexander Graf (3):
  S390: Add stub for cpu_get_phys_page_debug
  S390: Tell user why VM creation failed
  S390: Don't compile in virtio-pci

 Makefile.objs  |3 ++-
 default-configs/arm-softmmu.mak|1 +
 default-configs/cris-softmmu.mak   |1 +
 default-configs/i386-softmmu.mak   |1 +
 default-configs/m68k-softmmu.mak   |1 +
 default-configs/microblaze-softmmu.mak |1 +
 default-configs/mips-softmmu.mak   |1 +
 default-configs/mips64-softmmu.mak |1 +
 default-configs/mips64el-softmmu.mak   |1 +
 default-configs/mipsel-softmmu.mak |1 +
 default-configs/ppc-softmmu.mak|1 +
 default-configs/ppc64-softmmu.mak  |1 +
 default-configs/ppcemb-softmmu.mak |1 +
 default-configs/sh4-softmmu.mak|1 +
 default-configs/sh4eb-softmmu.mak  |1 +
 default-configs/sparc-softmmu.mak  |1 +
 default-configs/sparc64-softmmu.mak|1 +
 default-configs/x86_64-softmmu.mak |1 +
 kvm-all.c  |7 ++-
 target-s390x/helper.c  |5 +
 20 files changed, 30 insertions(+), 2 deletions(-)





[Qemu-devel] [PATCH 1/3] S390: Add stub for cpu_get_phys_page_debug

2010-03-24 Thread Alexander Graf
We don't implement any virtual memory in the S390 target so far, so let's
add a stub for this now mandatory function.

Fixes building of S390 target.

Signed-off-by: Alexander Graf ag...@suse.de
---
 target-s390x/helper.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/target-s390x/helper.c b/target-s390x/helper.c
index ba0c052..4a5297b 100644
--- a/target-s390x/helper.c
+++ b/target-s390x/helper.c
@@ -58,6 +58,11 @@ void cpu_reset(CPUS390XState *env)
 tlb_flush(env, 1);
 }
 
+target_phys_addr_t cpu_get_phys_page_debug(CPUState *env, target_ulong addr)
+{
+return 0;
+}
+
 #ifndef CONFIG_USER_ONLY
 
 int cpu_s390x_handle_mmu_fault (CPUState *env, target_ulong address, int rw,
-- 
1.6.0.2





[Qemu-devel] [PATCH 3/3] S390: Don't compile in virtio-pci

2010-03-24 Thread Alexander Graf
As soon as virtio-pci.c gets compiled and used on S390 the internal qdev magic
gets confused and tries to give us PCI devices instead of S390 virtio devices.

Since we don't have PCI on S390, we can safely not compile virtio-pci at all.

In order to do this I added a new config option CONFIG_PCI that I enabled
for every platform except S390. Thanks to this the change should be a complete
nop for every other platform.

If anyone feels like their platform shouldn't support PCI either, just remove
the config option. If you think we should build even less when we don't have
PCI, feel free to come up with a follow-up patch.

Signed-off-by: Alexander Graf ag...@suse.de
---
 Makefile.objs  |3 ++-
 default-configs/arm-softmmu.mak|1 +
 default-configs/cris-softmmu.mak   |1 +
 default-configs/i386-softmmu.mak   |1 +
 default-configs/m68k-softmmu.mak   |1 +
 default-configs/microblaze-softmmu.mak |1 +
 default-configs/mips-softmmu.mak   |1 +
 default-configs/mips64-softmmu.mak |1 +
 default-configs/mips64el-softmmu.mak   |1 +
 default-configs/mipsel-softmmu.mak |1 +
 default-configs/ppc-softmmu.mak|1 +
 default-configs/ppc64-softmmu.mak  |1 +
 default-configs/ppcemb-softmmu.mak |1 +
 default-configs/sh4-softmmu.mak|1 +
 default-configs/sh4eb-softmmu.mak  |1 +
 default-configs/sparc-softmmu.mak  |1 +
 default-configs/sparc64-softmmu.mak|1 +
 default-configs/x86_64-softmmu.mak |1 +
 18 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index 281f7a6..a6ce4f5 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -128,7 +128,8 @@ user-obj-y += cutils.o cache-utils.o
 
 hw-obj-y =
 hw-obj-y += loader.o
-hw-obj-y += virtio.o virtio-console.o virtio-pci.o
+hw-obj-y += virtio.o virtio-console.o
+hw-obj-$(CONFIG_PCI) += virtio-pci.o
 hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o
 hw-obj-y += watchdog.o
 hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 02ad192..a3819e1 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -24,3 +24,4 @@ CONFIG_SSI_SD=y
 CONFIG_LAN9118=y
 CONFIG_SMC91C111=y
 CONFIG_DS1338=y
+CONFIG_PCI=y
diff --git a/default-configs/cris-softmmu.mak b/default-configs/cris-softmmu.mak
index 8711402..9377235 100644
--- a/default-configs/cris-softmmu.mak
+++ b/default-configs/cris-softmmu.mak
@@ -2,3 +2,4 @@
 
 CONFIG_NAND=y
 CONFIG_PTIMER=y
+CONFIG_PCI=y
diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 4dbf656..192dcb8 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -16,3 +16,4 @@ CONFIG_IDE_PIIX=y
 CONFIG_NE2000_ISA=y
 CONFIG_PIIX_PCI=y
 CONFIG_SOUND=y
+CONFIG_PCI=y
diff --git a/default-configs/m68k-softmmu.mak b/default-configs/m68k-softmmu.mak
index 0a78375..07e02d6 100644
--- a/default-configs/m68k-softmmu.mak
+++ b/default-configs/m68k-softmmu.mak
@@ -2,3 +2,4 @@
 
 CONFIG_GDBSTUB_XML=y
 CONFIG_PTIMER=y
+CONFIG_PCI=y
diff --git a/default-configs/microblaze-softmmu.mak 
b/default-configs/microblaze-softmmu.mak
index c800c16..ddc3c15 100644
--- a/default-configs/microblaze-softmmu.mak
+++ b/default-configs/microblaze-softmmu.mak
@@ -1,3 +1,4 @@
 # Default configuration for microblaze-softmmu
 
 CONFIG_PTIMER=y
+CONFIG_PCI=y
diff --git a/default-configs/mips-softmmu.mak b/default-configs/mips-softmmu.mak
index 345a093..4598a0d 100644
--- a/default-configs/mips-softmmu.mak
+++ b/default-configs/mips-softmmu.mak
@@ -16,3 +16,4 @@ CONFIG_IDE_ISA=y
 CONFIG_IDE_PIIX=y
 CONFIG_NE2000_ISA=y
 CONFIG_SOUND=y
+CONFIG_PCI=y
diff --git a/default-configs/mips64-softmmu.mak 
b/default-configs/mips64-softmmu.mak
index 5900ee6..6c7f3c6 100644
--- a/default-configs/mips64-softmmu.mak
+++ b/default-configs/mips64-softmmu.mak
@@ -16,3 +16,4 @@ CONFIG_IDE_ISA=y
 CONFIG_IDE_PIIX=y
 CONFIG_NE2000_ISA=y
 CONFIG_SOUND=y
+CONFIG_PCI=y
diff --git a/default-configs/mips64el-softmmu.mak 
b/default-configs/mips64el-softmmu.mak
index 3e1ba93..2dcac82 100644
--- a/default-configs/mips64el-softmmu.mak
+++ b/default-configs/mips64el-softmmu.mak
@@ -16,3 +16,4 @@ CONFIG_IDE_ISA=y
 CONFIG_IDE_PIIX=y
 CONFIG_NE2000_ISA=y
 CONFIG_SOUND=y
+CONFIG_PCI=y
diff --git a/default-configs/mipsel-softmmu.mak 
b/default-configs/mipsel-softmmu.mak
index 17b83d0..e3e3878 100644
--- a/default-configs/mipsel-softmmu.mak
+++ b/default-configs/mipsel-softmmu.mak
@@ -16,3 +16,4 @@ CONFIG_IDE_ISA=y
 CONFIG_IDE_PIIX=y
 CONFIG_NE2000_ISA=y
 CONFIG_SOUND=y
+CONFIG_PCI=y
diff --git a/default-configs/ppc-softmmu.mak b/default-configs/ppc-softmmu.mak
index 5fe591c..50932fa 100644
--- a/default-configs/ppc-softmmu.mak
+++ b/default-configs/ppc-softmmu.mak
@@ -15,3 +15,4 @@ CONFIG_IDE_ISA=y
 CONFIG_IDE_CMD646=y
 CONFIG_NE2000_ISA=y
 CONFIG_SOUND=y
+CONFIG_PCI=y
diff --git 

[Qemu-devel] [PATCH 2/3] S390: Tell user why VM creation failed

2010-03-24 Thread Alexander Graf
The KVM kernel module on S390 refuses to create a VM when the switch_amode
kernel parameter is not used.

Since that is not exactly obvious, let's give the user a nice warning.

Signed-off-by: Alexander Graf ag...@suse.de
---
 kvm-all.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 534ead0..acf7e31 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -609,8 +609,13 @@ int kvm_init(int smp_cpus)
 }
 
 s-vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);
-if (s-vmfd  0)
+if (s-vmfd  0) {
+#ifdef TARGET_S390X
+fprintf(stderr, Please add the 'switch_amode' kernel parameter to 
+your host kernel command line\n);
+#endif
 goto err;
+}
 
 /* initially, KVM allocated its own memory and we had to jump through
  * hooks to make phys_ram_base point to this.  Modern versions of KVM
-- 
1.6.0.2





[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Blue Swirl
On 3/24/10, Juan Quintela quint...@redhat.com wrote:
 Blue Swirl blauwir...@gmail.com wrote:
   Hi,
  
   Here's some planning for getting most files compiled as few times as
   possible. Comments and suggestions are welcome.


 I took some thought about this at some point.  Problems here start from
  Recursive Makefile condered Harmful (tm).

  Look at how we jump through hops to be able to compile things in
  one/other side.

  We have:
  Makefile
  Makefile.target (really lots of them, one for target)
  Makefile.hw
  Makefile.user

  If we had only a single Makefile, things in this department would be
  much, much easier. And no, convert to a single Makefile is not trivial
  either, but it would make things easier.

  Why do we have several Makefiles?  Because we want to compile each file
  with different options.

  Why do we need to abuse so much VPATH?  Because we need to bring files
  randomly from $(ROOT), $(ROOT)/hw  $(ROOT)/$(TARGET).

Would it help if the Makefiles were scattered to each directory, for
example instead of Makefile.hw we had hw/Makefile?

  Problem here, there isn't a simple way to compile files for several
  target just once (no way to put them).

  Our main copmile rule is:

  $(QEMU_PROG): $(obj-y) $(obj-$(TARGET_BASE_ARCH)-y)
 $(call LINK,$(obj-y) $(obj-$(TARGET_BASE_ARCH)-y))


  (notice that things compiled in Makefile are trivial, they are already
  compiled just once by definition, problems are for all the qemu's we
  compile).

  We could change: $(obj-$(TARGET_BASE_ARCH)-y) to something like:

  OBJ-TARGET=s/.o/.$(TARGET_BASE_ARCH).o/

  (I forgot the subst Makefile syntax), and have the:

  %.$(TARGET_BASE_ARCH).o: %.c
gcc $(TARGET_BASE_ARCH options)

  From there, as you suggested, we need some files that are not compiled
  by architecture, they need to be compiled by board, well, we need to add
  yet another level obj-$(TARGET_BOARD) or whatever.

  Notice that this is a lot of work, but you are needing the audit to be
  able to compile only once.  Problem just now is that there is not a
  simple way to describe that information,  with my proposal it gets
  trivial to express:

  obj-$(CONFIG_FOO) += foo.o  # You need this for everything
  obj-mips-$(CONFIG_FOO) += foo.o  # You need this for all mips boards
  obj-malta-$(CONFIG_FOO) += foo.o  # You need this for all mips malta board

  You still need to do some different magic from hw-32/64 but it could be
  done this way.  Once you did it this way, you now where the files are
  (hw or target) and you can drop the VPATH tricks.

  Problem with this proposal is that it is not trivial to do in little
  steps, and the real big advantages appear when you switch to a single
  Makefile at the end.

I may have missed something, but the compile process doesn't care
about boards, because all boards for some architecture (and therefore
all devices used by all boards) are linked to a single
per-architecture executable. So why introduce the boards concept to
Makefiles?

   vl.c: a lot of work. Maybe the CPUState stuff should be separated to a new 
 file.


 That should just be a rule in Documentation.  You can't but anything
  else in vl.c.  If you move anything out of vl.c (see timers work from
  bonzini for example), you get a wild card for free commit bypassing
  maintainers or some similar price :)

Cleaning up vl.c would be great, but just for purpose of single
compilation, it's enough to put the CPUState pieces to a target
dependent file (cpu-common.c?) and compile the rest once by making
TARGET_xxx conditional code unconditional. This may still be doable.

  rest of files

  I haven't really looked at them at depth.

  I looked when I cleaned up the build system, I thought how to do the
  next step (outlined before), but got sidetracked by other more urgent
  things.

Thanks for the comments.




[Qemu-devel] Re: [RFC] vhost-blk implementation

2010-03-24 Thread Badari Pulavarty

Michael S. Tsirkin wrote:

On Tue, Mar 23, 2010 at 12:55:07PM -0700, Badari Pulavarty wrote:
  

Michael S. Tsirkin wrote:


On Tue, Mar 23, 2010 at 10:57:33AM -0700, Badari Pulavarty wrote:
  
  

Michael S. Tsirkin wrote:



On Mon, Mar 22, 2010 at 05:34:04PM -0700, Badari Pulavarty wrote:

  

Write Results:
==

I see degraded IO performance when doing sequential IO write
tests with vhost-blk compared to virtio-blk.

# time dd of=/dev/vda if=/dev/zero bs=2M oflag=direct

I get ~110MB/sec with virtio-blk, but I get only ~60MB/sec with
vhost-blk. Wondering why ?



Try to look and number of interrupts and/or number of exits.

  
I checked interrupts and IO exits - there is no major noticeable   
difference between

vhost-blk and virtio-blk scenerios.



It could also be that you are overrunning some queue.

I don't see any exit mitigation strategy in your patch:
when there are already lots of requests in a queue, it's usually
a good idea to disable notifications and poll the
queue as requests complete. That could help performance.

  
Do you mean poll eventfd for new requests instead of waiting for new  
notifications ?

Where do you do that in vhost-net code ?



vhost_disable_notify does this.

  
  
Unlike network socket, since we are dealing with a file, there is no  
-poll support for it.
So I can't poll for the data. And also, Issue I am having is on the   
write() side.



Not sure I understand.

  
  

I looked at it some more - I see 512K write requests on the
virtio-queue  in both vhost-blk and virtio-blk cases. Both qemu or
vhost is doing synchronous  writes to page cache (there is no write
batching in qemu that is affecting this  case).  I still puzzled on
why virtio-blk outperforms vhost-blk.

Thanks,
Badari



If you say the number of requests is the same, we are left with:
- requests are smaller for some reason?
- something is causing retries?
  
  
No. IO requests sizes are exactly same (512K) in both cases. There are  
no retries or
errors in both cases. One thing I am not clear is - for some reason  
guest kernel
could push more data into virtio-ring in case of virtio-blk vs  
vhost-blk. Is this possible ?
Does guest gets to run much sooner in virtio-blk case than vhost-blk ?  
Sorry, if its dumb question -

I don't understand  all the vhost details :(

Thanks,
Badari



BTW, did you put the backend in non-blocking mode?
As I said, vhost net passes non-blocking flag to
socket backend, but vfs_write/read that you use does
not have an option to do this.

So we'll need to extend the backend to fix that,
but just for demo purposes, you could set non-blocking
mode on the file from userspace.

  

Michael,

Atleast I understand why the performance difference now.. My debug
code is changed the behaviour of virtio-blk which confused me.

1) virtio-blk is able to batch up writes from various requests.
2) virtio-blk offloads the writes to different thread

Where as vhost-blk, I do each request syncrhonously. Hence
the difference. You are right - i have to make backend async.
I will working on handing off work to in-kernel threads.
I am not sure about IO completion handling and calling
vhost_add_used_and_signal() out of context. But let
me take a stab at it and ask your help if I run into
issues.

Thanks,
Badari







Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Blue Swirl
On 3/24/10, Richard Henderson r...@twiddle.net wrote:
 On 03/24/2010 02:47 AM, Paolo Bonzini wrote:

  1) make CPUState define only common fields. Include CPUState at the
  beginning of each per-target CPUXYZState.
 

  Irritatingly, the common fields contain quite big TLBs.  And the
  offsets from the start of env affect the compactness of the code
  generated from TCG.  We really really want the general registers
  to come first to make sure that those offsets fit the host's
  reg+offset addressing mode.

One trick is to define a fixed offset (about half CPUState size) and
make env point to CPUState + offset. Then the negative part of the
offset space can be used efficiently. But this just doubles the space
that can be accessed fast, so it's not a big win.




Re: [Qemu-devel] [PATCH 3/3] S390: Don't compile in virtio-pci

2010-03-24 Thread Blue Swirl
On 3/24/10, Alexander Graf ag...@suse.de wrote:
 As soon as virtio-pci.c gets compiled and used on S390 the internal qdev magic
  gets confused and tries to give us PCI devices instead of S390 virtio 
 devices.

  Since we don't have PCI on S390, we can safely not compile virtio-pci at all.

  In order to do this I added a new config option CONFIG_PCI that I enabled
  for every platform except S390. Thanks to this the change should be a 
 complete
  nop for every other platform.

The name should be CONFIG_VIRTIO_PCI, CONFIG_PCI would enable
compilation of pci.c.

  If anyone feels like their platform shouldn't support PCI either, just remove
  the config option. If you think we should build even less when we don't have
  PCI, feel free to come up with a follow-up patch.

None of the currently supported Sparc32 boards have PCI. There are
some real devices with PCI (JavaStations) though.




Re: [Qemu-devel] [PATCH 3/3] S390: Don't compile in virtio-pci

2010-03-24 Thread Alexander Graf
Blue Swirl wrote:
 On 3/24/10, Alexander Graf ag...@suse.de wrote:
   
 As soon as virtio-pci.c gets compiled and used on S390 the internal qdev 
 magic
  gets confused and tries to give us PCI devices instead of S390 virtio 
 devices.

  Since we don't have PCI on S390, we can safely not compile virtio-pci at 
 all.

  In order to do this I added a new config option CONFIG_PCI that I enabled
  for every platform except S390. Thanks to this the change should be a 
 complete
  nop for every other platform.
 

 The name should be CONFIG_VIRTIO_PCI, CONFIG_PCI would enable
 compilation of pci.c.
   

My idea was to keep the config option generic and rip out all PCI stuff
from s390x-softmmu. We don't even do MMIO :-).

Alex




[Qemu-devel] Re: [RFC] vhost-blk implementation

2010-03-24 Thread Michael S. Tsirkin
On Wed, Mar 24, 2010 at 10:58:50AM -0700, Badari Pulavarty wrote:
 Michael S. Tsirkin wrote:
 On Tue, Mar 23, 2010 at 12:55:07PM -0700, Badari Pulavarty wrote:
   
 Michael S. Tsirkin wrote:
 
 On Tue, Mar 23, 2010 at 10:57:33AM -0700, Badari Pulavarty wrote:
 
 Michael S. Tsirkin wrote:
 
 On Mon, Mar 22, 2010 at 05:34:04PM -0700, Badari Pulavarty wrote:
   
 Write Results:
 ==

 I see degraded IO performance when doing sequential IO write
 tests with vhost-blk compared to virtio-blk.

 # time dd of=/dev/vda if=/dev/zero bs=2M oflag=direct

 I get ~110MB/sec with virtio-blk, but I get only ~60MB/sec with
 vhost-blk. Wondering why ?
 
 Try to look and number of interrupts and/or number of exits.
   
 I checked interrupts and IO exits - there is no major noticeable  
  difference between
 vhost-blk and virtio-blk scenerios.
 
 It could also be that you are overrunning some queue.

 I don't see any exit mitigation strategy in your patch:
 when there are already lots of requests in a queue, it's usually
 a good idea to disable notifications and poll the
 queue as requests complete. That could help performance.
   
 Do you mean poll eventfd for new requests instead of waiting for 
 new  notifications ?
 Where do you do that in vhost-net code ?
 
 vhost_disable_notify does this.

 
 Unlike network socket, since we are dealing with a file, there is 
 no  -poll support for it.
 So I can't poll for the data. And also, Issue I am having is on 
 the   write() side.
 
 Not sure I understand.

 
 I looked at it some more - I see 512K write requests on the
 virtio-queue  in both vhost-blk and virtio-blk cases. Both qemu or
 vhost is doing synchronous  writes to page cache (there is no write
 batching in qemu that is affecting this  case).  I still puzzled on
 why virtio-blk outperforms vhost-blk.

 Thanks,
 Badari
 
 If you say the number of requests is the same, we are left with:
 - requests are smaller for some reason?
 - something is causing retries?
 
 No. IO requests sizes are exactly same (512K) in both cases. There 
 are  no retries or
 errors in both cases. One thing I am not clear is - for some reason   
 guest kernel
 could push more data into virtio-ring in case of virtio-blk vs   
 vhost-blk. Is this possible ?
 Does guest gets to run much sooner in virtio-blk case than vhost-blk 
 ?  Sorry, if its dumb question -
 I don't understand  all the vhost details :(

 Thanks,
 Badari
 

 BTW, did you put the backend in non-blocking mode?
 As I said, vhost net passes non-blocking flag to
 socket backend, but vfs_write/read that you use does
 not have an option to do this.

 So we'll need to extend the backend to fix that,
 but just for demo purposes, you could set non-blocking
 mode on the file from userspace.

   
 Michael,

 Atleast I understand why the performance difference now.. My debug
 code is changed the behaviour of virtio-blk which confused me.

 1) virtio-blk is able to batch up writes from various requests.
 2) virtio-blk offloads the writes to different thread

 Where as vhost-blk, I do each request syncrhonously. Hence
 the difference. You are right - i have to make backend async.
 I will working on handing off work to in-kernel threads.
 I am not sure about IO completion handling and calling
 vhost_add_used_and_signal() out of context. But let
 me take a stab at it and ask your help if I run into
 issues.

 Thanks,
 Badari



The way I did it for vhost net, requests are synchronous
but non-blocking. So if it can't be done directly,
I delay it.

-- 
MST




Re: [Qemu-devel] [PATCH 3/3] S390: Don't compile in virtio-pci

2010-03-24 Thread Blue Swirl
On 3/24/10, Alexander Graf ag...@suse.de wrote:
 Blue Swirl wrote:
   On 3/24/10, Alexander Graf ag...@suse.de wrote:
  
   As soon as virtio-pci.c gets compiled and used on S390 the internal qdev 
 magic
gets confused and tries to give us PCI devices instead of S390 virtio 
 devices.
  
Since we don't have PCI on S390, we can safely not compile virtio-pci at 
 all.
  
In order to do this I added a new config option CONFIG_PCI that I 
 enabled
for every platform except S390. Thanks to this the change should be a 
 complete
nop for every other platform.
  
  
   The name should be CONFIG_VIRTIO_PCI, CONFIG_PCI would enable
   compilation of pci.c.
  


 My idea was to keep the config option generic and rip out all PCI stuff
  from s390x-softmmu. We don't even do MMIO :-).

In that case you should also rip out all PCI cards.




[Qemu-devel] Re: [PATCH 0/4] monitor: Convert do_set_link() to QObject, QError

2010-03-24 Thread Luiz Capitulino
On Tue, 23 Mar 2010 11:27:54 +0100
Markus Armbruster arm...@redhat.com wrote:

 PATCH 3/4 changes syntax of set_link's second argument from up|down to
 on|off.  I feel that the argument needs to be boolean in QMP, and this
 is the simplest way to get it.
 
 Alternatives I could try if the syntax change is unwanted:
 
 * Use the old string argument in QMP.  Easy.
 
 * Don't convert set_link, create a new command with a boolean
   argument.
 
 * Create a argument parser for up|down.

 I like your approach. Daniel do you use set_link in libvirt already?
I've grepped around I didn't found any reference for it.

 
 Markus Armbruster (4):
   monitor: Rename argument type 'b' to 'f'
   monitor: New argument type 'b'
   monitor: Use argument type 'b' for set_link
   monitor: Convert do_set_link() to QObject, QError
 
  monitor.c   |   39 +++
  net.c   |   17 ++---
  net.h   |2 +-
  qemu-monitor.hx |   13 +++--
  4 files changed, 49 insertions(+), 22 deletions(-)
 





[Qemu-devel] [PATCH 2/2] machine opts framework

2010-03-24 Thread Glauber Costa
This patch adds initial support for the -machine option, that allows
command line specification of machine attributes (always relying on safe
defaults). Besides its value per-se, it is the saner way we found to
allow for enabling/disabling of kvm's in-kernel irqchip.

A machine with in-kernel-irqchip could be specified as:
-machine irqchip=apic-kvm
And one without it:
-machine irqchip=apic

To demonstrate how it'd work, this patch introduces a choice between
pic and apic, pic being the old-style isa thing. 

---
 hw/boards.h |   10 ++
 hw/pc.c |   45 +++--
 qemu-config.c   |   16 
 qemu-config.h   |1 +
 qemu-options.hx |9 +
 vl.c|   54 ++
 6 files changed, 129 insertions(+), 6 deletions(-)

diff --git a/hw/boards.h b/hw/boards.h
index 6f0f0d7..831728c 100644
--- a/hw/boards.h
+++ b/hw/boards.h
@@ -12,6 +12,15 @@ typedef void QEMUMachineInitFunc(ram_addr_t ram_size,
  const char *initrd_filename,
  const char *cpu_model);
 
+typedef void (QEMUIrqchipFunc)(void *opaque);
+
+typedef struct QEMUIrqchip {
+const char *name;
+QEMUIrqchipFunc *init; 
+int used;
+int is_default;
+} QEMUIrqchip;
+
 typedef struct QEMUMachine {
 const char *name;
 const char *alias;
@@ -28,6 +37,7 @@ typedef struct QEMUMachine {
 no_sdcard:1;
 int is_default;
 GlobalProperty *compat_props;
+QEMUIrqchip *irqchip;
 struct QEMUMachine *next;
 } QEMUMachine;
 
diff --git a/hw/pc.c b/hw/pc.c
index ba14df0..43ec022 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -752,21 +752,43 @@ int cpu_is_bsp(CPUState *env)
 return env-cpu_index == 0;
 }
 
+static void qemu_apic_init(void *opaque)
+{
+CPUState *env = opaque;
+if (!(env-cpuid_features  CPUID_APIC)) {
+fprintf(stderr, CPU lacks APIC cpuid flag\n);
+exit(1);
+}
+env-cpuid_apic_id = env-cpu_index;
+/* APIC reset callback resets cpu */
+apic_init(env);
+}
+
+static void qemu_pic_init(void *opaque)
+{
+CPUState *env = opaque;
+
+if (smp_cpus  1) {
+fprintf(stderr, PIC can't support smp systems\n);
+exit(1);
+}
+qemu_register_reset((QEMUResetHandler*)cpu_reset, env);
+}
+
 static CPUState *pc_new_cpu(const char *cpu_model)
 {
 CPUState *env;
+QEMUIrqchip *ic;
 
 env = cpu_init(cpu_model);
 if (!env) {
 fprintf(stderr, Unable to find x86 CPU definition\n);
 exit(1);
 }
-if ((env-cpuid_features  CPUID_APIC) || smp_cpus  1) {
-env-cpuid_apic_id = env-cpu_index;
-/* APIC reset callback resets cpu */
-apic_init(env);
-} else {
-qemu_register_reset((QEMUResetHandler*)cpu_reset, env);
+
+for (ic = current_machine-irqchip; ic-name != NULL; ic++) {
+if (ic-used)
+ic-init(env);
 }
 return env;
 }
@@ -1074,6 +1096,17 @@ static QEMUMachine pc_machine = {
 .desc = Standard PC,
 .init = pc_init_pci,
 .max_cpus = 255,
+.irqchip = (QEMUIrqchip[]){
+{
+.name = apic,
+.init = qemu_apic_init,
+.is_default = 1,
+},{
+.name = pic,
+.init = qemu_pic_init,
+},
+{ /* end of list */ },
+},
 .is_default = 1,
 };
 
diff --git a/qemu-config.c b/qemu-config.c
index 150157c..2b985a9 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -296,6 +296,21 @@ QemuOptsList qemu_cpudef_opts = {
 },
 };
 
+QemuOptsList qemu_machine_opts = {
+.name = M,
+.head = QTAILQ_HEAD_INITIALIZER(qemu_machine_opts.head),
+.desc = {
+{
+.name = mach,
+.type = QEMU_OPT_STRING,
+},{
+.name = irqchip,
+.type = QEMU_OPT_STRING,
+},
+{ /* end of list */ }
+},
+};
+
 static QemuOptsList *lists[] = {
 qemu_drive_opts,
 qemu_chardev_opts,
@@ -306,6 +321,7 @@ static QemuOptsList *lists[] = {
 qemu_global_opts,
 qemu_mon_opts,
 qemu_cpudef_opts,
+qemu_machine_opts,
 NULL,
 };
 
diff --git a/qemu-config.h b/qemu-config.h
index f217c58..ea302f0 100644
--- a/qemu-config.h
+++ b/qemu-config.h
@@ -10,6 +10,7 @@ extern QemuOptsList qemu_rtc_opts;
 extern QemuOptsList qemu_global_opts;
 extern QemuOptsList qemu_mon_opts;
 extern QemuOptsList qemu_cpudef_opts;
+extern QemuOptsList qemu_machine_opts;
 
 QemuOptsList *qemu_find_opts(const char *group);
 int qemu_set_option(const char *str);
diff --git a/qemu-options.hx b/qemu-options.hx
index 8450b45..585ecd2 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -34,6 +34,15 @@ STEXI
 Select the emulated @var{machine} (@code{-M ?} for list)
 ETEXI
 
+DEF(machine, HAS_ARG, QEMU_OPTION_machine,
+-machine mach=m[,irqchip=chip]\n
+select emulated machine (-machine ? for list)\n)
+STEXI
+...@item -machine 

[Qemu-devel] [PATCH 1/2] early set current_machine

2010-03-24 Thread Glauber Costa
this way, the machine_init function itself can know which machine is current
in use, not only the late init code.
---
 vl.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/vl.c b/vl.c
index d69250c..ceddeac 100644
--- a/vl.c
+++ b/vl.c
@@ -4841,6 +4841,9 @@ int main(int argc, char **argv, char **envp)
 }
 qemu_add_globals();
 
+
+current_machine = machine;
+
 machine-init(ram_size, boot_devices,
   kernel_filename, kernel_cmdline, initrd_filename, cpu_model);
 
@@ -4859,8 +4862,6 @@ int main(int argc, char **argv, char **envp)
 }
 }
 
-current_machine = machine;
-
 /* init USB devices */
 if (usb_enabled) {
 if (foreach_device_config(DEV_USB, usb_parse)  0)
-- 
1.6.6





[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Juan Quintela
Blue Swirl blauwir...@gmail.com wrote:
 On 3/24/10, Juan Quintela quint...@redhat.com wrote:
 Blue Swirl blauwir...@gmail.com wrote:
   Hi,
  
   Here's some planning for getting most files compiled as few times as
   possible. Comments and suggestions are welcome.


 I took some thought about this at some point.  Problems here start from
  Recursive Makefile condered Harmful (tm).

  Look at how we jump through hops to be able to compile things in
  one/other side.

  We have:
  Makefile
  Makefile.target (really lots of them, one for target)
  Makefile.hw
  Makefile.user

  If we had only a single Makefile, things in this department would be
  much, much easier. And no, convert to a single Makefile is not trivial
  either, but it would make things easier.

  Why do we have several Makefiles?  Because we want to compile each file
  with different options.

  Why do we need to abuse so much VPATH?  Because we need to bring files
  randomly from $(ROOT), $(ROOT)/hw  $(ROOT)/$(TARGET).

 Would it help if the Makefiles were scattered to each directory, for
 example instead of Makefile.hw we had hw/Makefile?

The interesting one is Makefile.target, hw/Makefile could help, but that
is far away from where the action is.

If you look at Makefile.target from far enough, you will see that it
basically has:

libobj-$(CONFIG_FOO) = ...

ifdef CONFIG_LINUX_USER

endif

ifdef CONFIG_DARWIN_USER
...
endif

ifdef CONFIG_BSD_USER
...
endif

ifdef CONFIG_SOFTMMU
...
endif


The shared bits are very small (out of the libobj-y stuff).
Spliting the others in different Makefiles (or whatever is easy).  How
to get this ones compiled only once per BASE_ARCH/whatever should put us
near the goal of a single Makefile (and compiling each thing just the
number of times required).  Some thought is needed to know how to work here.

Actually, Anthony suggested at some point to just use 64 bits for
TARGET_PHYS_ADDR_BITS and remove the need for hw32/64.

I think that people emulationg 32bits on 32bits would suffer, but have
no clue how much.  Anthony, what was the idea?

  Problem with this proposal is that it is not trivial to do in little
  steps, and the real big advantages appear when you switch to a single
  Makefile at the end.

 I may have missed something, but the compile process doesn't care
 about boards, because all boards for some architecture (and therefore
 all devices used by all boards) are linked to a single
 per-architecture executable. So why introduce the boards concept to
 Makefiles?

I missunderstood this bit of your previous message:

 The target dependent cases should be next. On full build, each MIPS
 device file gets compiled four times, PPC files three times and x86
 twice. The devices for architectures that are compiled only once (ARM,
 Cris, Sparc32 etc.) do not need to be touched.

I was refering to this ones, but somehow got confused with boards :(


   vl.c: a lot of work. Maybe the CPUState stuff should be separated to a 
 new file.


 That should just be a rule in Documentation.  You can't but anything
  else in vl.c.  If you move anything out of vl.c (see timers work from
  bonzini for example), you get a wild card for free commit bypassing
  maintainers or some similar price :)

 Cleaning up vl.c would be great, but just for purpose of single
 compilation, it's enough to put the CPUState pieces to a target
 dependent file (cpu-common.c?) and compile the rest once by making
 TARGET_xxx conditional code unconditional. This may still be doable.

I haven't looked at detail at this :(

  rest of files

  I haven't really looked at them at depth.

  I looked when I cleaned up the build system, I thought how to do the
  next step (outlined before), but got sidetracked by other more urgent
  things.

 Thanks for the comments.

You are welcome.

Later, Juan.




[Qemu-devel] Re: [PATCH 2/4] monitor: New argument type 'b'

2010-03-24 Thread Luiz Capitulino
On Tue, 23 Mar 2010 11:27:56 +0100
Markus Armbruster arm...@redhat.com wrote:

 This is a boolean value.  Human monitor accepts on or off.
 Consistent with option parsing (see parse_option_bool()).
 
 Signed-off-by: Markus Armbruster arm...@redhat.com
 ---
  monitor.c |   31 +++
  1 files changed, 31 insertions(+), 0 deletions(-)
 
 diff --git a/monitor.c b/monitor.c
 index 3ce9a4e..47b68a2 100644
 --- a/monitor.c
 +++ b/monitor.c
 @@ -85,6 +85,8 @@
   *
   * '?'  optional type (for all types, except '/')
   * '.'  other form of optional type (for 'i' and 'l')
 + * 'b'  boolean
 + *  user mode accepts on or off
   * '-'  optional parameter (eg. '-f')
   *
   */
 @@ -3841,6 +3843,29 @@ static const mon_cmd_t *monitor_parse_command(Monitor 
 *mon,
  qdict_put(qdict, key, qfloat_from_double(val));
  }
  break;
 +case 'b':
 +{
 +const char *beg;
 +int val;
 +
 +while (qemu_isspace(*p)) {
 +p++;
 +}
 +beg = p;
 +while (qemu_isgraph(*p)) {
 +p++;
 +}
 +if (!strncmp(beg, on, p - beg)) {
 +val = 1;
 +} else if (!strncmp(beg, off, p - beg)) {
 +val = 0;
 +} else {
 +monitor_printf(mon, Expected 'on' or 'off'\n);
 +goto fail;
 +}

 This will make 'on' be the default when no on/off is specified, is that
your intention? I'm wondering if this can cause problems when you add
optional support for it and mixes it with other arguments.

 +qdict_put(qdict, key, qbool_from_int(val));
 +}
 +break;
  case '-':
  {
  const char *tmp = p;
 @@ -4322,6 +4347,12 @@ static int check_arg(const CmdArgs *cmd_args, QDict 
 *args)
  return -1;
  }
  break;
 +case 'b':
 +if (qobject_type(value) != QTYPE_QBOOL) {
 +qerror_report(QERR_INVALID_PARAMETER_TYPE, name, bool);
 +return -1;
 +}
 +break;
  case '-':
  if (qobject_type(value) != QTYPE_QINT 
  qobject_type(value) != QTYPE_QBOOL) {





[Qemu-devel] Re: [PATCH 3/3] monitor: Convert do_migrate() to QError

2010-03-24 Thread Luiz Capitulino
On Tue, 23 Mar 2010 19:07:21 +0100
Markus Armbruster arm...@redhat.com wrote:

 Human monitor error message changes from unknown migration protocol:
 FOO to Invalid parameter uri.
 
 The conversion is shallow: the FOO_start_outgoing_migration() aren't
 converted.  Converting them is a big job for relatively little
 practical benefit, so leave it for later.
 
 Signed-off-by: Markus Armbruster arm...@redhat.com
 ---
  migration.c |9 +
  1 files changed, 5 insertions(+), 4 deletions(-)
 
 diff --git a/migration.c b/migration.c
 index 05f6cc5..47d2ab5 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -56,14 +56,14 @@ void qemu_start_incoming_migration(const char *uri)
  
  int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data)
  {
 -MigrationState *s = NULL;
 +MigrationState *s;
  const char *p;
  int detach = qdict_get_int(qdict, detach);
  const char *uri = qdict_get_str(qdict, uri);
  
  if (current_migration 
  current_migration-get_status(current_migration) == 
 MIG_STATE_ACTIVE) {
 -monitor_printf(mon, migration already in progress\n);
 +qerror_report(QERR_MIGRATION_IN_PROGRESS);
  return -1;
  }

 What about QERR_OPERATION_IN_PROGRESS? So that we have:

Operation already in progress: migration.

  
 @@ -86,12 +86,13 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
 **ret_data)
  (int)qdict_get_int(qdict, inc));
  #endif
  } else {
 -monitor_printf(mon, unknown migration protocol: %s\n, uri);
 +qerror_report(QERR_INVALID_PARAMETER, uri);
  return -1;
  }
  
  if (s == NULL) {
 -monitor_printf(mon, migration failed\n);
 +/* TODO push error reporting into the FOO_start_outgoing_migration() 
 */
 +qerror_report(QERR_MIGRATION_FAILED);
  return -1;
  }

 I think this one is no better than the automatic UndefinedError
which is going to be triggered. I would only touch this when/if
we get the migration functions converted.




Re: [Qemu-devel] [PATCH 2/2] machine opts framework

2010-03-24 Thread Anthony Liguori

On 03/24/2010 02:26 PM, Glauber Costa wrote:

This patch adds initial support for the -machine option, that allows
command line specification of machine attributes (always relying on safe
defaults). Besides its value per-se, it is the saner way we found to
allow for enabling/disabling of kvm's in-kernel irqchip.

A machine with in-kernel-irqchip could be specified as:
-machine irqchip=apic-kvm
And one without it:
-machine irqchip=apic

To demonstrate how it'd work, this patch introduces a choice between
pic and apic, pic being the old-style isa thing.
   


I started from a different place.  See machine-qemuopts in my staging tree.

I think we should combine efforts.

Regards,

Anthony Liguori


---
  hw/boards.h |   10 ++
  hw/pc.c |   45 +++--
  qemu-config.c   |   16 
  qemu-config.h   |1 +
  qemu-options.hx |9 +
  vl.c|   54 ++
  6 files changed, 129 insertions(+), 6 deletions(-)

diff --git a/hw/boards.h b/hw/boards.h
index 6f0f0d7..831728c 100644
--- a/hw/boards.h
+++ b/hw/boards.h
@@ -12,6 +12,15 @@ typedef void QEMUMachineInitFunc(ram_addr_t ram_size,
   const char *initrd_filename,
   const char *cpu_model);

+typedef void (QEMUIrqchipFunc)(void *opaque);
+
+typedef struct QEMUIrqchip {
+const char *name;
+QEMUIrqchipFunc *init;
+int used;
+int is_default;
+} QEMUIrqchip;
+
  typedef struct QEMUMachine {
  const char *name;
  const char *alias;
@@ -28,6 +37,7 @@ typedef struct QEMUMachine {
  no_sdcard:1;
  int is_default;
  GlobalProperty *compat_props;
+QEMUIrqchip *irqchip;
  struct QEMUMachine *next;
  } QEMUMachine;

diff --git a/hw/pc.c b/hw/pc.c
index ba14df0..43ec022 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -752,21 +752,43 @@ int cpu_is_bsp(CPUState *env)
  return env-cpu_index == 0;
  }

+static void qemu_apic_init(void *opaque)
+{
+CPUState *env = opaque;
+if (!(env-cpuid_features  CPUID_APIC)) {
+fprintf(stderr, CPU lacks APIC cpuid flag\n);
+exit(1);
+}
+env-cpuid_apic_id = env-cpu_index;
+/* APIC reset callback resets cpu */
+apic_init(env);
+}
+
+static void qemu_pic_init(void *opaque)
+{
+CPUState *env = opaque;
+
+if (smp_cpus  1) {
+fprintf(stderr, PIC can't support smp systems\n);
+exit(1);
+}
+qemu_register_reset((QEMUResetHandler*)cpu_reset, env);
+}
+
  static CPUState *pc_new_cpu(const char *cpu_model)
  {
  CPUState *env;
+QEMUIrqchip *ic;

  env = cpu_init(cpu_model);
  if (!env) {
  fprintf(stderr, Unable to find x86 CPU definition\n);
  exit(1);
  }
-if ((env-cpuid_features  CPUID_APIC) || smp_cpus  1) {
-env-cpuid_apic_id = env-cpu_index;
-/* APIC reset callback resets cpu */
-apic_init(env);
-} else {
-qemu_register_reset((QEMUResetHandler*)cpu_reset, env);
+
+for (ic = current_machine-irqchip; ic-name != NULL; ic++) {
+if (ic-used)
+ic-init(env);
  }
  return env;
  }
@@ -1074,6 +1096,17 @@ static QEMUMachine pc_machine = {
  .desc = Standard PC,
  .init = pc_init_pci,
  .max_cpus = 255,
+.irqchip = (QEMUIrqchip[]){
+{
+.name = apic,
+.init = qemu_apic_init,
+.is_default = 1,
+},{
+.name = pic,
+.init = qemu_pic_init,
+},
+{ /* end of list */ },
+},
  .is_default = 1,
  };

diff --git a/qemu-config.c b/qemu-config.c
index 150157c..2b985a9 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -296,6 +296,21 @@ QemuOptsList qemu_cpudef_opts = {
  },
  };

+QemuOptsList qemu_machine_opts = {
+.name = M,
+.head = QTAILQ_HEAD_INITIALIZER(qemu_machine_opts.head),
+.desc = {
+{
+.name = mach,
+.type = QEMU_OPT_STRING,
+},{
+.name = irqchip,
+.type = QEMU_OPT_STRING,
+},
+{ /* end of list */ }
+},
+};
+
  static QemuOptsList *lists[] = {
  qemu_drive_opts,
  qemu_chardev_opts,
@@ -306,6 +321,7 @@ static QemuOptsList *lists[] = {
  qemu_global_opts,
  qemu_mon_opts,
  qemu_cpudef_opts,
+qemu_machine_opts,
  NULL,
  };

diff --git a/qemu-config.h b/qemu-config.h
index f217c58..ea302f0 100644
--- a/qemu-config.h
+++ b/qemu-config.h
@@ -10,6 +10,7 @@ extern QemuOptsList qemu_rtc_opts;
  extern QemuOptsList qemu_global_opts;
  extern QemuOptsList qemu_mon_opts;
  extern QemuOptsList qemu_cpudef_opts;
+extern QemuOptsList qemu_machine_opts;

  QemuOptsList *qemu_find_opts(const char *group);
  int qemu_set_option(const char *str);
diff --git a/qemu-options.hx b/qemu-options.hx
index 8450b45..585ecd2 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -34,6 +34,15 @@ STEXI
  Select the 

[Qemu-devel] Re: [PATCH] update bochs vbe interface

2010-03-24 Thread Gerd Hoffmann

On 03/24/10 18:04, Juan Quintela wrote:

Gerd Hoffmannkra...@redhat.com  wrote:

The bochs vbe interface got a new register a while back, which specifies
the linear framebuffer size in 64k units.  This patch adds support for
the new register to qemu.  With this patch applied vgabios 0.6c works
with qemu.

Signed-off-by: Gerd Hoffmannkra...@redhat.com


It breaks migration.

vga.c:2218:VMSTATE_UINT16_ARRAY(vbe_regs, VGACommonState, 
VBE_DISPI_INDEX_NB),



2218 is the interesting line.  Can we freeze this patch until the
subsections stuff is done?


Yea, I see.  Well, the new register doesn't carry any state, it is 
read-only information for the guest.  So the easy way out is to simply 
not save it as we don't have to.


cheers,
  Gerd





[Qemu-devel] [PATCH v2] update bochs vbe interface

2010-03-24 Thread Gerd Hoffmann
The bochs vbe interface got a new register a while back, which specifies
the linear framebuffer size in 64k units.  This patch adds support for
the new register to qemu.  With this patch applied vgabios 0.6c works
with qemu.

[ v2:  Don't savevm the new register.  Doing so breaks migration,
   and as it carries read-only information for the guest there
   is no need to save it. ]
---
 hw/vga.c |5 +++--
 hw/vga_int.h |6 +-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/hw/vga.c b/hw/vga.c
index 6a1a059..f9e07cf 100644
--- a/hw/vga.c
+++ b/hw/vga.c
@@ -1955,7 +1955,8 @@ void vga_common_reset(VGACommonState *s)
 #ifdef CONFIG_BOCHS_VBE
 s-vbe_index = 0;
 memset(s-vbe_regs, '\0', sizeof(s-vbe_regs));
-s-vbe_regs[VBE_DISPI_INDEX_ID] = VBE_DISPI_ID0;
+s-vbe_regs[VBE_DISPI_INDEX_ID] = VBE_DISPI_ID5;
+s-vbe_regs[VBE_DISPI_INDEX_VIDEO_MEMORY_64K] = s-vram_size / (64 * 1024);
 s-vbe_start_addr = 0;
 s-vbe_line_offset = 0;
 s-vbe_bank_mask = (s-vram_size  16) - 1;
@@ -2215,7 +2216,7 @@ const VMStateDescription vmstate_vga_common = {
 VMSTATE_UINT8_EQUAL(is_vbe_vmstate, VGACommonState),
 #ifdef CONFIG_BOCHS_VBE
 VMSTATE_UINT16(vbe_index, VGACommonState),
-VMSTATE_UINT16_ARRAY(vbe_regs, VGACommonState, VBE_DISPI_INDEX_NB),
+VMSTATE_UINT16_ARRAY(vbe_regs, VGACommonState, 
VBE_DISPI_INDEX_NB_VMSTATE),
 VMSTATE_UINT32(vbe_start_addr, VGACommonState),
 VMSTATE_UINT32(vbe_line_offset, VGACommonState),
 VMSTATE_UINT32(vbe_bank_mask, VGACommonState),
diff --git a/hw/vga_int.h b/hw/vga_int.h
index 23a42ef..c3c5e21 100644
--- a/hw/vga_int.h
+++ b/hw/vga_int.h
@@ -47,13 +47,17 @@
 #define VBE_DISPI_INDEX_VIRT_HEIGHT 0x7
 #define VBE_DISPI_INDEX_X_OFFSET0x8
 #define VBE_DISPI_INDEX_Y_OFFSET0x9
-#define VBE_DISPI_INDEX_NB  0xa
+#define VBE_DISPI_INDEX_VIDEO_MEMORY_64K 0xa
+
+#define VBE_DISPI_INDEX_NB_VMSTATE  0xa
+#define VBE_DISPI_INDEX_NB  0xb
 
 #define VBE_DISPI_ID0   0xB0C0
 #define VBE_DISPI_ID1   0xB0C1
 #define VBE_DISPI_ID2   0xB0C2
 #define VBE_DISPI_ID3   0xB0C3
 #define VBE_DISPI_ID4   0xB0C4
+#define VBE_DISPI_ID5   0xB0C5
 
 #define VBE_DISPI_DISABLED  0x00
 #define VBE_DISPI_ENABLED   0x01
-- 
1.6.6.1





Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Avi Kivity

On 03/24/2010 06:42 PM, Luiz Capitulino wrote:

On Wed, 24 Mar 2010 12:42:16 +0200
Avi Kivitya...@redhat.com  wrote:

   

So, at best qemud is a toy for people who are annoyed by libvirt.
 

  Is the reason for doing this in qemu because libvirt is annoying?


Mostly.


I don't see
how adding yet another layer/daemon is going to improve ours and user's life
(the same applies for libqemu).
   


libvirt becomes optional.


  If I got it right, there were two complaints from the kvm-devel flamewar:

1. Qemu has usability problems
2. There's no way an external tool can get /proc/kallsyms info from Qemu

  I don't see how libqemu can help with 1) and having qemud doesn't seem
the best solution for 2) either.

  Still talking about 2), what's wrong in getting the PID or having a QMP
connection in a well known location as suggested by Anthony?
   


I now believe that's the best option.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.





[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Blue Swirl
On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote:
 On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote:
   rtl8139.c, e1000.c: need to convert ldl/stl to 
 cpu_physical_memory_read/write.


 I don't see how it would help. These still get target_phys_addr_t which
  is per-target. Further, a ton of devices do
  cpu_register_physical_memory/qemu_register_coalesced_mmio.
  These are also per target.

I don't know what I was eating yesterday: there are no references to
ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple
for the device itself, just add a property be. The attached patch
performs this part.

But now there is a bigger problem, how to pass the property to the
device. It's not fair to require the user to remember to set it.

  A simple solution would be to change all of cpu_XX functions to
  get a 64 bit address. This is a lot of churn, if we do this
  anyway we should also pass length to callbacks, this way
  rwhandler will get very small or go away completely.

It's not too much effort to keep the target_phys_addr_t type.
From e0ab5cc41c68207be558ccb330f4fb83fba4ee6f Mon Sep 17 00:00:00 2001
From: Blue Swirl blauwir...@gmail.com
Date: Wed, 24 Mar 2010 19:54:05 +
Subject: [PATCH] Compile rtl8139 and e1000 only once

WIP

Signed-off-by: Blue Swirl blauwir...@gmail.com
---
 Makefile.objs   |2 +
 Makefile.target |4 --
 hw/e1000.c  |  108 ++
 hw/rtl8139.c|   82 +++---
 4 files changed, 147 insertions(+), 49 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index 281f7a6..54895f8 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -155,6 +155,8 @@ hw-obj-y += msix.o
 hw-obj-y += ne2000.o
 hw-obj-y += eepro100.o
 hw-obj-y += pcnet.o
+hw-obj-y += rtl8139.o
+hw-obj-y += e1000.o
 
 hw-obj-$(CONFIG_SMC91C111) += smc91c111.o
 hw-obj-$(CONFIG_LAN9118) += lan9118.o
diff --git a/Makefile.target b/Makefile.target
index eb4d010..1a86fc4 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -176,10 +176,6 @@ QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
 # xen backend driver support
 obj-$(CONFIG_XEN) += xen_machine_pv.o xen_domainbuild.o
 
-# PCI network cards
-obj-y += rtl8139.o
-obj-y += e1000.o
-
 # Hardware support
 obj-i386-y = ide/core.o
 obj-i386-y += pckbd.o dma.o
diff --git a/hw/e1000.c b/hw/e1000.c
index fd3059a..0f72db8 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -121,6 +121,7 @@ typedef struct E1000State_st {
 uint16_t reading;
 uint32_t old_eecd;
 } eecd_state;
+uint32_t be;
 } E1000State;
 
 #define	defreg(x)	x = (E1000_##x2)
@@ -825,14 +826,11 @@ static void (*macreg_writeops[])(E1000State *, int, uint32_t) = {
 enum { NWRITEOPS = ARRAY_SIZE(macreg_writeops) };
 
 static void
-e1000_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
+e1000_mmio_writel_le(void *opaque, target_phys_addr_t addr, uint32_t val)
 {
 E1000State *s = opaque;
 unsigned int index = (addr  0x1)  2;
 
-#ifdef TARGET_WORDS_BIGENDIAN
-val = bswap32(val);
-#endif
 if (index  NWRITEOPS  macreg_writeops[index])
 macreg_writeops[index](s, index, val);
 else if (index  NREADOPS  macreg_readops[index])
@@ -841,25 +839,47 @@ e1000_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
 DBGOUT(UNKNOWN, MMIO unknown write addr=0x%08x,val=0x%08x\n,
index2, val);
 }
+static void
+e1000_mmio_writel_be(void *opaque, target_phys_addr_t addr, uint32_t val)
+{
+val = bswap32(val);
+e1000_mmio_writel_le(opaque, addr, val);
+}
 
 static void
-e1000_mmio_writew(void *opaque, target_phys_addr_t addr, uint32_t val)
+e1000_mmio_writew_le(void *opaque, target_phys_addr_t addr, uint32_t val)
 {
 // emulate hw without byte enables: no RMW
-e1000_mmio_writel(opaque, addr  ~3,
-  (val  0x)  (8*(addr  3)));
+e1000_mmio_writel_le(opaque, addr  ~3,
+ (val  0x)  (8*(addr  3)));
 }
 
 static void
-e1000_mmio_writeb(void *opaque, target_phys_addr_t addr, uint32_t val)
+e1000_mmio_writew_be(void *opaque, target_phys_addr_t addr, uint32_t val)
 {
 // emulate hw without byte enables: no RMW
-e1000_mmio_writel(opaque, addr  ~3,
-  (val  0xff)  (8*(addr  3)));
+e1000_mmio_writel_be(opaque, addr  ~3,
+ (val  0x)  (8*(addr  3)));
+}
+
+static void
+e1000_mmio_writeb_be(void *opaque, target_phys_addr_t addr, uint32_t val)
+{
+// emulate hw without byte enables: no RMW
+e1000_mmio_writel_be(opaque, addr  ~3,
+ (val  0xff)  (8*(addr  3)));
+}
+
+static void
+e1000_mmio_writeb_le(void *opaque, target_phys_addr_t addr, uint32_t val)
+{
+// emulate hw without byte enables: no RMW
+e1000_mmio_writel_le(opaque, addr  ~3,
+ (val  0xff)  (8*(addr  3)));
 }
 
 static uint32_t
-e1000_mmio_readl(void *opaque, target_phys_addr_t addr)
+e1000_mmio_readl_le(void 

Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Luiz Capitulino
On Wed, 24 Mar 2010 21:49:45 +0200
Avi Kivity a...@redhat.com wrote:

 On 03/24/2010 06:42 PM, Luiz Capitulino wrote:
  On Wed, 24 Mar 2010 12:42:16 +0200
  Avi Kivitya...@redhat.com  wrote:
 
 
  So, at best qemud is a toy for people who are annoyed by libvirt.
   
Is the reason for doing this in qemu because libvirt is annoying?
 
 Mostly.
 
  I don't see
  how adding yet another layer/daemon is going to improve ours and user's life
  (the same applies for libqemu).
 
 
 libvirt becomes optional.

 I think it should only be optional if all you want is to run a single VM
in this case what seems to be missing on our side is a _real_ GUI, bundled
with QEMU potentially written in a high-level language.

 Then we make virt-manager optional and this is good because we can sync
features way faster and we don't have to care about _managing_ several
VMs, our world in terms of usability and maintainability is about one VM.

 IMVHO, everything else should be done by third-party tools like libvirt,
we just provide the means for it.




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Richard Henderson
On 03/24/2010 10:07 AM, Richard Henderson wrote:
 struct CPUSmallCommonState
 {
 // most of the stuff from CPU_COMMON.
 // sorted for some thought of padding elimination.  ;-)
 };
 
 struct CPULargeCommonState
 {
 CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE];
 target_phys_addr_t iotlb[NB_MMU_MODES][CPU_TLB_SIZE];
 struct TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];
 jmp_buf jmp_env;
 };
...
 Now.  If you're compiling a file for which cpu-specific code is ok:
 
 register CPUXYZLargeState *env __asm__(AREG0);
 #define ENV_SMALL_COMMON_STATE(env-s.common_s)
 #define ENV_LARGE_COMMON_STATE(env-common_l)
 
 If you're compiling a file which is supposed to be independant of cpu:
 
 register CPUSmallCommonState *env __asm__(AREG0);
 #define ENV_SMALL_COMMON_STATE(env)
 #define ENV_LARGE_COMMON_STATE((CPULargeCommonState *)((char *)env + 
 cpu_large_state_offset))

On 03/24/2010 11:00 AM, Blue Swirl wrote:
 One trick is to define a fixed offset (about half CPUState size) and
 make env point to CPUState + offset. Then the negative part of the
 offset space can be used efficiently. But this just doubles the space
 that can be accessed fast, so it's not a big win.

A good idea.  We can eliminate the cpu_large_state_offset from above via:

struct CPUSmallCommonState
{
// most of the stuff from CPU_COMMON.
} __attribute__((aligned));

struct CPUXYZPrivateState
{
// all the cpu-specific stuff
};

struct CPUXYZSmallState
{
CPUXYZPrivateState p;
CPUSmallCommonState s;
};

struct CPUXYZAllState
{
CPUXYZSmallState s;
CPULargeCommonState l;  // ARG0 register points here.
};

register void *biased_env __asm__(AREG0);

static inline CPUXYZPrivateState *get_env_cpu_private(void)
{
return ((CPUXYZSmallState *)biased_env - 1)-p;
}

static inline CPUSmallCommonState *get_env_common_small(void)
{
return (CPUSmallCommonState *)biased_env - 1;
}

static inline CPULargeCommonState *get_env_common_large(void)
{
return (CPULargeCommonState *)biased_env;
}


r~




[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Michael S. Tsirkin
On Wed, Mar 24, 2010 at 10:05:10PM +0200, Blue Swirl wrote:
 On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote:
  On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote:
rtl8139.c, e1000.c: need to convert ldl/stl to 
  cpu_physical_memory_read/write.
 
 
  I don't see how it would help. These still get target_phys_addr_t which
   is per-target. Further, a ton of devices do
   cpu_register_physical_memory/qemu_register_coalesced_mmio.
   These are also per target.
 
 I don't know what I was eating yesterday: there are no references to
 ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple
 for the device itself, just add a property be. The attached patch
 performs this part.
 
 But now there is a bigger problem, how to pass the property to the
 device. It's not fair to require the user to remember to set it.

I still don't fully understand how come e1000 cares about
target endianness.

   A simple solution would be to change all of cpu_XX functions to
   get a 64 bit address. This is a lot of churn, if we do this
   anyway we should also pass length to callbacks, this way
   rwhandler will get very small or go away completely.
 
 It's not too much effort to keep the target_phys_addr_t type.

I don't understand - target_phys_addr_t is different for different
targets to we will need to recompile the code for each target.
What am I missing?


 From e0ab5cc41c68207be558ccb330f4fb83fba4ee6f Mon Sep 17 00:00:00 2001
 From: Blue Swirl blauwir...@gmail.com
 Date: Wed, 24 Mar 2010 19:54:05 +
 Subject: [PATCH] Compile rtl8139 and e1000 only once
 
 WIP
 
 Signed-off-by: Blue Swirl blauwir...@gmail.com
 ---
  Makefile.objs   |2 +
  Makefile.target |4 --
  hw/e1000.c  |  108 ++
  hw/rtl8139.c|   82 +++---
  4 files changed, 147 insertions(+), 49 deletions(-)
 
 diff --git a/Makefile.objs b/Makefile.objs
 index 281f7a6..54895f8 100644
 --- a/Makefile.objs
 +++ b/Makefile.objs
 @@ -155,6 +155,8 @@ hw-obj-y += msix.o
  hw-obj-y += ne2000.o
  hw-obj-y += eepro100.o
  hw-obj-y += pcnet.o
 +hw-obj-y += rtl8139.o
 +hw-obj-y += e1000.o
  
  hw-obj-$(CONFIG_SMC91C111) += smc91c111.o
  hw-obj-$(CONFIG_LAN9118) += lan9118.o
 diff --git a/Makefile.target b/Makefile.target
 index eb4d010..1a86fc4 100644
 --- a/Makefile.target
 +++ b/Makefile.target
 @@ -176,10 +176,6 @@ QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
  # xen backend driver support
  obj-$(CONFIG_XEN) += xen_machine_pv.o xen_domainbuild.o
  
 -# PCI network cards
 -obj-y += rtl8139.o
 -obj-y += e1000.o
 -
  # Hardware support
  obj-i386-y = ide/core.o
  obj-i386-y += pckbd.o dma.o
 diff --git a/hw/e1000.c b/hw/e1000.c
 index fd3059a..0f72db8 100644
 --- a/hw/e1000.c
 +++ b/hw/e1000.c
 @@ -121,6 +121,7 @@ typedef struct E1000State_st {
  uint16_t reading;
  uint32_t old_eecd;
  } eecd_state;
 +uint32_t be;
  } E1000State;
  
  #define  defreg(x)   x = (E1000_##x2)
 @@ -825,14 +826,11 @@ static void (*macreg_writeops[])(E1000State *, int, 
 uint32_t) = {
  enum { NWRITEOPS = ARRAY_SIZE(macreg_writeops) };
  
  static void
 -e1000_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
 +e1000_mmio_writel_le(void *opaque, target_phys_addr_t addr, uint32_t val)
  {
  E1000State *s = opaque;
  unsigned int index = (addr  0x1)  2;
  
 -#ifdef TARGET_WORDS_BIGENDIAN
 -val = bswap32(val);
 -#endif
  if (index  NWRITEOPS  macreg_writeops[index])
  macreg_writeops[index](s, index, val);
  else if (index  NREADOPS  macreg_readops[index])
 @@ -841,25 +839,47 @@ e1000_mmio_writel(void *opaque, target_phys_addr_t 
 addr, uint32_t val)
  DBGOUT(UNKNOWN, MMIO unknown write addr=0x%08x,val=0x%08x\n,
 index2, val);
  }
 +static void
 +e1000_mmio_writel_be(void *opaque, target_phys_addr_t addr, uint32_t val)
 +{
 +val = bswap32(val);
 +e1000_mmio_writel_le(opaque, addr, val);
 +}
  
  static void
 -e1000_mmio_writew(void *opaque, target_phys_addr_t addr, uint32_t val)
 +e1000_mmio_writew_le(void *opaque, target_phys_addr_t addr, uint32_t val)
  {
  // emulate hw without byte enables: no RMW
 -e1000_mmio_writel(opaque, addr  ~3,
 -  (val  0x)  (8*(addr  3)));
 +e1000_mmio_writel_le(opaque, addr  ~3,
 + (val  0x)  (8*(addr  3)));
  }
  
  static void
 -e1000_mmio_writeb(void *opaque, target_phys_addr_t addr, uint32_t val)
 +e1000_mmio_writew_be(void *opaque, target_phys_addr_t addr, uint32_t val)
  {
  // emulate hw without byte enables: no RMW
 -e1000_mmio_writel(opaque, addr  ~3,
 -  (val  0xff)  (8*(addr  3)));
 +e1000_mmio_writel_be(opaque, addr  ~3,
 + (val  0x)  (8*(addr  3)));
 +}
 +
 +static void
 +e1000_mmio_writeb_be(void *opaque, target_phys_addr_t addr, uint32_t val)
 +{
 +// emulate hw without byte enables: no RMW
 +

[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Blue Swirl
On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote:
 On Wed, Mar 24, 2010 at 10:05:10PM +0200, Blue Swirl wrote:
   On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote:
On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote:
  rtl8139.c, e1000.c: need to convert ldl/stl to 
 cpu_physical_memory_read/write.
   
   
I don't see how it would help. These still get target_phys_addr_t which
 is per-target. Further, a ton of devices do
 cpu_register_physical_memory/qemu_register_coalesced_mmio.
 These are also per target.
  
   I don't know what I was eating yesterday: there are no references to
   ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple
   for the device itself, just add a property be. The attached patch
   performs this part.
  
   But now there is a bigger problem, how to pass the property to the
   device. It's not fair to require the user to remember to set it.


 I still don't fully understand how come e1000 cares about
  target endianness.

It shouldn't. Maybe the real fix is to remove the byte swapping.

 A simple solution would be to change all of cpu_XX functions to
 get a 64 bit address. This is a lot of churn, if we do this
 anyway we should also pass length to callbacks, this way
 rwhandler will get very small or go away completely.
  
   It's not too much effort to keep the target_phys_addr_t type.


 I don't understand - target_phys_addr_t is different for different
  targets to we will need to recompile the code for each target.
  What am I missing?

target_phys_addr_t is 64 bit on a 64 bit host, on a 32 bit host it's
size will be either 64 or 32 bits depending on the target. So the
files are compiled once on 64 bit host, twice on 32 bit host if both
32 and 64 bits targets are selected.

   From e0ab5cc41c68207be558ccb330f4fb83fba4ee6f Mon Sep 17 00:00:00 2001
   From: Blue Swirl blauwir...@gmail.com
   Date: Wed, 24 Mar 2010 19:54:05 +
   Subject: [PATCH] Compile rtl8139 and e1000 only once
  
   WIP
  
   Signed-off-by: Blue Swirl blauwir...@gmail.com
   ---
Makefile.objs   |2 +
Makefile.target |4 --
hw/e1000.c  |  108 
 ++
hw/rtl8139.c|   82 +++---
4 files changed, 147 insertions(+), 49 deletions(-)
  
   diff --git a/Makefile.objs b/Makefile.objs
   index 281f7a6..54895f8 100644
   --- a/Makefile.objs
   +++ b/Makefile.objs
   @@ -155,6 +155,8 @@ hw-obj-y += msix.o
hw-obj-y += ne2000.o
hw-obj-y += eepro100.o
hw-obj-y += pcnet.o
   +hw-obj-y += rtl8139.o
   +hw-obj-y += e1000.o
  
hw-obj-$(CONFIG_SMC91C111) += smc91c111.o
hw-obj-$(CONFIG_LAN9118) += lan9118.o
   diff --git a/Makefile.target b/Makefile.target
   index eb4d010..1a86fc4 100644
   --- a/Makefile.target
   +++ b/Makefile.target
   @@ -176,10 +176,6 @@ QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
# xen backend driver support
obj-$(CONFIG_XEN) += xen_machine_pv.o xen_domainbuild.o
  
   -# PCI network cards
   -obj-y += rtl8139.o
   -obj-y += e1000.o
   -
# Hardware support
obj-i386-y = ide/core.o
obj-i386-y += pckbd.o dma.o
   diff --git a/hw/e1000.c b/hw/e1000.c
   index fd3059a..0f72db8 100644
   --- a/hw/e1000.c
   +++ b/hw/e1000.c
   @@ -121,6 +121,7 @@ typedef struct E1000State_st {
uint16_t reading;
uint32_t old_eecd;
} eecd_state;
   +uint32_t be;
} E1000State;
  
#define  defreg(x)   x = (E1000_##x2)
   @@ -825,14 +826,11 @@ static void (*macreg_writeops[])(E1000State *, int, 
 uint32_t) = {
enum { NWRITEOPS = ARRAY_SIZE(macreg_writeops) };
  
static void
   -e1000_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
   +e1000_mmio_writel_le(void *opaque, target_phys_addr_t addr, uint32_t val)
{
E1000State *s = opaque;
unsigned int index = (addr  0x1)  2;
  
   -#ifdef TARGET_WORDS_BIGENDIAN
   -val = bswap32(val);
   -#endif
if (index  NWRITEOPS  macreg_writeops[index])
macreg_writeops[index](s, index, val);
else if (index  NREADOPS  macreg_readops[index])
   @@ -841,25 +839,47 @@ e1000_mmio_writel(void *opaque, target_phys_addr_t 
 addr, uint32_t val)
DBGOUT(UNKNOWN, MMIO unknown write addr=0x%08x,val=0x%08x\n,
   index2, val);
}
   +static void
   +e1000_mmio_writel_be(void *opaque, target_phys_addr_t addr, uint32_t val)
   +{
   +val = bswap32(val);
   +e1000_mmio_writel_le(opaque, addr, val);
   +}
  
static void
   -e1000_mmio_writew(void *opaque, target_phys_addr_t addr, uint32_t val)
   +e1000_mmio_writew_le(void *opaque, target_phys_addr_t addr, uint32_t val)
{
// emulate hw without byte enables: no RMW
   -e1000_mmio_writel(opaque, addr  ~3,
   -  (val  0x)  (8*(addr  3)));
   +e1000_mmio_writel_le(opaque, addr  ~3,
   + (val  0x)  (8*(addr  

[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Michael S. Tsirkin
On Wed, Mar 24, 2010 at 10:24:12PM +0200, Blue Swirl wrote:
 On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote:
  On Wed, Mar 24, 2010 at 10:05:10PM +0200, Blue Swirl wrote:
On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote:
 On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote:
   rtl8139.c, e1000.c: need to convert ldl/stl to 
  cpu_physical_memory_read/write.


 I don't see how it would help. These still get target_phys_addr_t which
  is per-target. Further, a ton of devices do
  cpu_register_physical_memory/qemu_register_coalesced_mmio.
  These are also per target.
   
I don't know what I was eating yesterday: there are no references to
ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple
for the device itself, just add a property be. The attached patch
performs this part.
   
But now there is a bigger problem, how to pass the property to the
device. It's not fair to require the user to remember to set it.
 
 
  I still don't fully understand how come e1000 cares about
   target endianness.
 
 It shouldn't. Maybe the real fix is to remove the byte swapping.

Presumably it's there for a reason?

  A simple solution would be to change all of cpu_XX functions to
  get a 64 bit address. This is a lot of churn, if we do this
  anyway we should also pass length to callbacks, this way
  rwhandler will get very small or go away completely.
   
It's not too much effort to keep the target_phys_addr_t type.
 
 
  I don't understand - target_phys_addr_t is different for different
   targets to we will need to recompile the code for each target.
   What am I missing?
 
 target_phys_addr_t is 64 bit on a 64 bit host, on a 32 bit host it's
 size will be either 64 or 32 bits depending on the target. So the
 files are compiled once on 64 bit host, twice on 32 bit host if both
 32 and 64 bits targets are selected.

How about just making it 64 bit unconditionally?
How much do we save by using a 32 bit address value?

-- 
MST




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Anthony Liguori

On 03/24/2010 03:24 PM, Blue Swirl wrote:

On 3/24/10, Michael S. Tsirkinm...@redhat.com  wrote:
   

On Wed, Mar 24, 2010 at 10:05:10PM +0200, Blue Swirl wrote:
On 3/24/10, Michael S. Tsirkinm...@redhat.com  wrote:
  On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote:
 rtl8139.c, e1000.c: need to convert ldl/stl to 
cpu_physical_memory_read/write.


  I don't see how it would help. These still get target_phys_addr_t which
   is per-target. Further, a ton of devices do
   cpu_register_physical_memory/qemu_register_coalesced_mmio.
   These are also per target.
  
I don't know what I was eating yesterday: there are no references to
ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple
for the device itself, just add a property be. The attached patch
performs this part.
  
But now there is a bigger problem, how to pass the property to the
device. It's not fair to require the user to remember to set it.


I still don't fully understand how come e1000 cares about
  target endianness.
 

It shouldn't. Maybe the real fix is to remove the byte swapping.
   


My previous pci memory functions patches removed the byte swapping.

The problem is that PCI devices are going to operate in little endian 
mode (usually) whereas the CPU may be acting in big endian mode.  We 
need to do a byte swap somewhere but the better place to do it is in the 
PCI bus layer.


Regards,

Anthony Liguori




[Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Blue Swirl
On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote:
 On Wed, Mar 24, 2010 at 10:24:12PM +0200, Blue Swirl wrote:
   On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote:
On Wed, Mar 24, 2010 at 10:05:10PM +0200, Blue Swirl wrote:
  On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote:
   On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote:
 rtl8139.c, e1000.c: need to convert ldl/stl to 
 cpu_physical_memory_read/write.
  
  
   I don't see how it would help. These still get target_phys_addr_t 
 which
is per-target. Further, a ton of devices do
cpu_register_physical_memory/qemu_register_coalesced_mmio.
These are also per target.
 
  I don't know what I was eating yesterday: there are no references to
  ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple
  for the device itself, just add a property be. The attached patch
  performs this part.
 
  But now there is a bigger problem, how to pass the property to the
  device. It's not fair to require the user to remember to set it.
   
   
I still don't fully understand how come e1000 cares about
 target endianness.
  
   It shouldn't. Maybe the real fix is to remove the byte swapping.


 Presumably it's there for a reason?


A simple solution would be to change all of cpu_XX functions to
get a 64 bit address. This is a lot of churn, if we do this
anyway we should also pass length to callbacks, this way
rwhandler will get very small or go away completely.
 
  It's not too much effort to keep the target_phys_addr_t type.
   
   
I don't understand - target_phys_addr_t is different for different
 targets to we will need to recompile the code for each target.
 What am I missing?
  
   target_phys_addr_t is 64 bit on a 64 bit host, on a 32 bit host it's
   size will be either 64 or 32 bits depending on the target. So the
   files are compiled once on 64 bit host, twice on 32 bit host if both
   32 and 64 bits targets are selected.


 How about just making it 64 bit unconditionally?
  How much do we save by using a 32 bit address value?

On a 32 bit host, probably a lot because of register pressure. And
it's not too much effort to keep the target_phys_addr_t type logic.




[Qemu-devel] Re: [PATCH 10/15] virtio-serial: Add QMP events for failed port/device add

2010-03-24 Thread Luiz Capitulino
On Wed, 24 Mar 2010 20:19:28 +0530
Amit Shah amit.s...@redhat.com wrote:

 When adding a port or a device to the guest fails, management software
 might be interested in knowing and then cleaning up the host-side of the
 port. Introduce QMP events to signal such errors.

 I'm completely unfamiliar with virtio-serial, so let me ask: how are ports
added? I'd expect the command performing this operation to fail in this case.





Re: [Qemu-devel] [PATCH] qemu: jaso-parser: Output the content of invalid keyword

2010-03-24 Thread Luiz Capitulino
On Wed, 24 Mar 2010 17:00:14 +0100
Markus Armbruster arm...@redhat.com wrote:

 Amos Kong ak...@redhat.com writes:
 
  When input some invialid word 'unknowcmd' through QMP port, qemu outputs 
  this
  error message:
  parse error: invalid keyword `%s'
  This patch makes qemu output the content of invalid keyword, like:
  parse error: invalid keyword `unknowcmd'
 
  Signed-off-by: Amos Kong ak...@redhat.com
 
 Looks good to me.
 
 Hint: it's best to put a version in the subject when you respin, like
 [PATCH v2] ...

 Yes, and maintainers may miss a patch down a thread (and it's a good
opportunity to fix the subject).




[Qemu-devel] x86_64: iret in long mode resets %fs and %gs base (doesn't on real CPUs)

2010-03-24 Thread Vegard Nossum
Hi,

I've been investigating why some of my code failed on qemu, but
succeeded in bochs and on real hardware. In particular, it turns out
that qemu would reset the FS/GS_BASE_MSR whenever I did iret from ring
0 to 3.

I traced it down to this bit of code (in target-i386/op_helper.c):

static inline void validate_seg(int seg_reg, int cpl)
{
int dpl;
uint32_t e2;

/* XXX: on x86_64, we do not want to nullify FS and GS because
   they may still contain a valid base. I would be interested to
   know how a real x86_64 CPU behaves */
if ((seg_reg == R_FS || seg_reg == R_GS) 
(env-segs[seg_reg].selector  0xfffc) == 0)
return;

So the reason why this didn't work in qemu for me was that I was
loading the selector as 8 -- which fails the above test and
validate_seg() proceeds to clear the segment base value. Changing my
own code to only load 0 into %gs from the start fixed the problem for
me.

However, qemu is clearly doing something differently from the real
hardware. I tested both versions (loading 0 or 8 into %gs) on my Intel
P4, and GS_BASE_MSR is preserved in both cases. Perhaps the condition
on the selector value should be removed? (Or perhaps the calls to
validate_seg() for R_FS/R_GS should be removed from
helper_ret_protected()?)

Just a heads up.


Vegard




[Qemu-devel] [PATCH] Compile ide/core only once

2010-03-24 Thread Blue Swirl
Make win2k install hack unconditional as it is still restricted to
x86 only in vl.c.

Replace TARGET_PAGE_SIZE with 4096 because that figure is already
used later.

Signed-off-by: Blue Swirl blauwir...@gmail.com
---
 Makefile.objs|1 +
 Makefile.target  |   11 ---
 default-configs/arm-softmmu.mak  |1 +
 default-configs/i386-softmmu.mak |1 +
 default-configs/mips-softmmu.mak |1 +
 default-configs/mips64-softmmu.mak   |1 +
 default-configs/mips64el-softmmu.mak |1 +
 default-configs/mipsel-softmmu.mak   |1 +
 default-configs/ppc-softmmu.mak  |1 +
 default-configs/ppc64-softmmu.mak|1 +
 default-configs/ppcemb-softmmu.mak   |1 +
 default-configs/sh4-softmmu.mak  |1 +
 default-configs/sh4eb-softmmu.mak|1 +
 default-configs/sparc64-softmmu.mak  |1 +
 default-configs/x86_64-softmmu.mak   |1 +
 hw/ide/core.c|   10 +++---
 vl.c |2 +-
 17 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index 281f7a6..fe81f6c 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -161,6 +161,7 @@ hw-obj-$(CONFIG_LAN9118) += lan9118.o
 hw-obj-$(CONFIG_NE2000_ISA) += ne2000-isa.o

 # IDE
+hw-obj-$(CONFIG_IDE_CORE) += ide/core.o
 hw-obj-$(CONFIG_IDE_QDEV) += ide/qdev.o
 hw-obj-$(CONFIG_IDE_PCI) += ide/pci.o
 hw-obj-$(CONFIG_IDE_ISA) += ide/isa.o
diff --git a/Makefile.target b/Makefile.target
index eb4d010..a17de90 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -181,8 +181,7 @@ obj-y += rtl8139.o
 obj-y += e1000.o

 # Hardware support
-obj-i386-y = ide/core.o
-obj-i386-y += pckbd.o dma.o
+obj-i386-y = pckbd.o dma.o
 obj-i386-y += vga.o
 obj-i386-y += mc146818rtc.o i8259.o pc.o
 obj-i386-y += cirrus_vga.o apic.o ioapic.o acpi.o piix_pci.o
@@ -191,7 +190,7 @@ obj-i386-y += device-hotplug.o pci-hotplug.o
smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o

 # shared objects
-obj-ppc-y = ppc.o ide/core.o ide/macio.o
+obj-ppc-y = ppc.o ide/macio.o
 obj-ppc-y += vga.o dma.o openpic.o
 # PREP target
 obj-ppc-y += pckbd.o i8259.o mc146818rtc.o
@@ -215,7 +214,6 @@ obj-mips-y += mips_addr.o mips_timer.o mips_int.o
 obj-mips-y += dma.o vga.o i8259.o rc4030.o
 obj-mips-y += vga-isa-mm.o
 obj-mips-y += g364fb.o jazz_led.o dp8393x.o
-obj-mips-y += ide/core.o
 obj-mips-y += gt64xxx.o pckbd.o mc146818rtc.o acpi.o ds1225y.o
 obj-mips-y += piix4.o cirrus_vga.o
 obj-mips-y += mipsnet.o
@@ -248,7 +246,6 @@ obj-cris-y += pflash_cfi02.o

 ifeq ($(TARGET_ARCH), sparc64)
 obj-sparc-y = sun4u.o pckbd.o apb_pci.o
-obj-sparc-y += ide/core.o
 obj-sparc-y += vga.o
 obj-sparc-y += mc146818rtc.o
 obj-sparc-y += cirrus_vga.o
@@ -268,7 +265,7 @@ obj-arm-y += arm-semi.o
 obj-arm-y += pxa2xx.o pxa2xx_pic.o pxa2xx_gpio.o pxa2xx_timer.o pxa2xx_dma.o
 obj-arm-y += pxa2xx_lcd.o pxa2xx_mmci.o pxa2xx_pcmcia.o pxa2xx_keypad.o
 obj-arm-y += pflash_cfi01.o gumstix.o
-obj-arm-y += zaurus.o ide/core.o ide/microdrive.o spitz.o tosa.o tc6393xb.o
+obj-arm-y += zaurus.o ide/microdrive.o spitz.o tosa.o tc6393xb.o
 obj-arm-y += omap1.o omap_lcdc.o omap_dma.o omap_clk.o omap_mmc.o omap_i2c.o
 obj-arm-y += omap2.o omap_dss.o soc_dma.o
 obj-arm-y += omap_sx1.o palm.o tsc210x.o
@@ -282,7 +279,7 @@ obj-arm-y += syborg_virtio.o

 obj-sh4-y = shix.o r2d.o sh7750.o sh7750_regnames.o tc58128.o
 obj-sh4-y += sh_timer.o sh_serial.o sh_intc.o sh_pci.o sm501.o
-obj-sh4-y += ide/core.o ide/mmio.o
+obj-sh4-y += ide/mmio.o

 obj-m68k-y = an5206.o mcf5206.o mcf_uart.o mcf_intc.o mcf5208.o mcf_fec.o
 obj-m68k-y += m68k-semi.o dummy_m68k.o
diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 02ad192..ea878a4 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -8,6 +8,7 @@ CONFIG_ECC=y
 CONFIG_SERIAL=y
 CONFIG_PTIMER=y
 CONFIG_SD=y
+CONFIG_IDE_CORE=y
 CONFIG_MAX7310=y
 CONFIG_WM8750=y
 CONFIG_TWL92230=y
diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 4dbf656..59eb670 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -9,6 +9,7 @@ CONFIG_I8254=y
 CONFIG_PCSPK=y
 CONFIG_USB_UHCI=y
 CONFIG_FDC=y
+CONFIG_IDE_CORE=y
 CONFIG_IDE_QDEV=y
 CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
diff --git a/default-configs/mips-softmmu.mak b/default-configs/mips-softmmu.mak
index 345a093..cb48ed1 100644
--- a/default-configs/mips-softmmu.mak
+++ b/default-configs/mips-softmmu.mak
@@ -10,6 +10,7 @@ CONFIG_I8254=y
 CONFIG_PCSPK=y
 CONFIG_USB_UHCI=y
 CONFIG_FDC=y
+CONFIG_IDE_CORE=y
 CONFIG_IDE_QDEV=y
 CONFIG_IDE_PCI=y
 CONFIG_IDE_ISA=y
diff --git a/default-configs/mips64-softmmu.mak
b/default-configs/mips64-softmmu.mak
index 5900ee6..585d6bb 100644
--- a/default-configs/mips64-softmmu.mak
+++ b/default-configs/mips64-softmmu.mak
@@ -10,6 +10,7 @@ CONFIG_I8254=y
 CONFIG_PCSPK=y
 CONFIG_USB_UHCI=y
 CONFIG_FDC=y
+CONFIG_IDE_CORE=y
 CONFIG_IDE_QDEV=y
 

Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Alexander Graf

On 24.03.2010, at 21:32, Anthony Liguori wrote:

 On 03/24/2010 03:12 PM, Luiz Capitulino wrote:
 On Wed, 24 Mar 2010 21:49:45 +0200
 Avi Kivitya...@redhat.com  wrote:
 
   
 On 03/24/2010 06:42 PM, Luiz Capitulino wrote:
 
 On Wed, 24 Mar 2010 12:42:16 +0200
 Avi Kivitya...@redhat.com   wrote:
 
 
   
 So, at best qemud is a toy for people who are annoyed by libvirt.
 
 
   Is the reason for doing this in qemu because libvirt is annoying?
   
 Mostly.
 
 
 I don't see
 how adding yet another layer/daemon is going to improve ours and user's 
 life
 (the same applies for libqemu).
 
   
 libvirt becomes optional.
 
  I think it should only be optional if all you want is to run a single VM
 in this case what seems to be missing on our side is a _real_ GUI, bundled
 with QEMU potentially written in a high-level language.
   
 
 That's a separate problem.
 
  Then we make virt-manager optional and this is good because we can sync
 features way faster and we don't have to care about _managing_ several
 VMs, our world in terms of usability and maintainability is about one VM.
 
  IMVHO, everything else should be done by third-party tools like libvirt,
 we just provide the means for it.
   
 
 We need to have a common management interface for third party tools.  libvirt 
 cannot be that today because of the fact that it doesn't support all of our 
 features.  What we need to figure out is how we can work with the libvirt 
 team to fix this.

The feature problem isn't the only one. It's also about ease of use. I 
personally find the qemu command line easier to use than anything 
libvirt-derived.

 So far, a libqemu.so with a flexible transport that could be used directly by 
 a libvirt user (ala cairo/gdk type interactions) seems like the best solution 
 to me.

ACK.

One thing I was thinking is that it might be a good idea to split the qemu 
command line version off as well. The qemu backend would then only speak QMP 
with an empty device case and the actual qemu command would run that, connect 
to it via QMP and instanciate everything.

That way we'd get a consistent interface for management apps while keeping an 
easy to use CLI.


Alex



Re: [Qemu-devel] [PATCH 2/2] machine opts framework

2010-03-24 Thread Glauber Costa
On Wed, Mar 24, 2010 at 02:43:35PM -0500, Anthony Liguori wrote:
 On 03/24/2010 02:26 PM, Glauber Costa wrote:
 This patch adds initial support for the -machine option, that allows
 command line specification of machine attributes (always relying on safe
 defaults). Besides its value per-se, it is the saner way we found to
 allow for enabling/disabling of kvm's in-kernel irqchip.
 
 A machine with in-kernel-irqchip could be specified as:
  -machine irqchip=apic-kvm
 And one without it:
  -machine irqchip=apic
 
 To demonstrate how it'd work, this patch introduces a choice between
 pic and apic, pic being the old-style isa thing.
 
 I started from a different place.  See machine-qemuopts in my staging tree.
 
 I think we should combine efforts.
 
 Regards,
 

Absolutely. I see little overlap between what we did. Just a comment on yours:

-static void an5206_init(ram_addr_t ram_size,
- const char *boot_device,
- const char *kernel_filename, const char *kernel_cmdline,
- const char *initrd_filename, const char *cpu_model)
+static void an5206_init(QemuOpts *opts)
 {

Since we're changing init functions anyway, I believe we should also pass
a pointer to the machine structure. With that, we can avoing using the global
current_machine.




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Aurelien Jarno
On Wed, Mar 24, 2010 at 10:24:12PM +0200, Blue Swirl wrote:
 On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote:
  On Wed, Mar 24, 2010 at 10:05:10PM +0200, Blue Swirl wrote:
On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote:
 On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote:
   rtl8139.c, e1000.c: need to convert ldl/stl to 
  cpu_physical_memory_read/write.


 I don't see how it would help. These still get target_phys_addr_t which
  is per-target. Further, a ton of devices do
  cpu_register_physical_memory/qemu_register_coalesced_mmio.
  These are also per target.
   
I don't know what I was eating yesterday: there are no references to
ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple
for the device itself, just add a property be. The attached patch
performs this part.
   
But now there is a bigger problem, how to pass the property to the
device. It's not fair to require the user to remember to set it.
 
 
  I still don't fully understand how come e1000 cares about
   target endianness.
 
 It shouldn't. Maybe the real fix is to remove the byte swapping.
 

The real fix is actually to add a layer handling bus byte swapping
depending on how bus are connected.

Currently it only works because all big endian boards QEMU emulates
need to byteswap bus access, and none of the little endian boards 
need to do that.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net




Re: [Qemu-devel] [PATCH 2/2] machine opts framework

2010-03-24 Thread Anthony Liguori

On 03/24/2010 03:58 PM, Glauber Costa wrote:

On Wed, Mar 24, 2010 at 02:43:35PM -0500, Anthony Liguori wrote:
   

On 03/24/2010 02:26 PM, Glauber Costa wrote:
 

This patch adds initial support for the -machine option, that allows
command line specification of machine attributes (always relying on safe
defaults). Besides its value per-se, it is the saner way we found to
allow for enabling/disabling of kvm's in-kernel irqchip.

A machine with in-kernel-irqchip could be specified as:
-machine irqchip=apic-kvm
And one without it:
-machine irqchip=apic

To demonstrate how it'd work, this patch introduces a choice between
pic and apic, pic being the old-style isa thing.
   

I started from a different place.  See machine-qemuopts in my staging tree.

I think we should combine efforts.

Regards,

 

Absolutely. I see little overlap between what we did. Just a comment on yours:

-static void an5206_init(ram_addr_t ram_size,
- const char *boot_device,
- const char *kernel_filename, const char *kernel_cmdline,
- const char *initrd_filename, const char *cpu_model)
+static void an5206_init(QemuOpts *opts)
  {

Since we're changing init functions anyway, I believe we should also pass
a pointer to the machine structure. With that, we can avoing using the global
current_machine.
   


Yes, I had the same thought.  For instance, with isa-pc is just pc_init 
with an extra parameter.  If we had a structure like:


typedef struct QEMUPCMachine
{
   QEMUMachine parent;
   int pci_enabled;
} QEMUPCMachine;

Then you wouldn't need those dispatch functions.  Not a huge win for x86 
but for sparc and some arm boards, it's pretty significant.


Regards,

Anthony Liguori




Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Luiz Capitulino
On Wed, 24 Mar 2010 21:54:09 +0100
Alexander Graf ag...@suse.de wrote:

 
 On 24.03.2010, at 21:32, Anthony Liguori wrote:
 
  On 03/24/2010 03:12 PM, Luiz Capitulino wrote:
  On Wed, 24 Mar 2010 21:49:45 +0200
  Avi Kivitya...@redhat.com  wrote:
  

  On 03/24/2010 06:42 PM, Luiz Capitulino wrote:
  
  On Wed, 24 Mar 2010 12:42:16 +0200
  Avi Kivitya...@redhat.com   wrote:
  
  

  So, at best qemud is a toy for people who are annoyed by libvirt.
  
  
Is the reason for doing this in qemu because libvirt is annoying?

  Mostly.
  
  
  I don't see
  how adding yet another layer/daemon is going to improve ours and user's 
  life
  (the same applies for libqemu).
  

  libvirt becomes optional.
  
   I think it should only be optional if all you want is to run a single VM
  in this case what seems to be missing on our side is a _real_ GUI, bundled
  with QEMU potentially written in a high-level language.

  
  That's a separate problem.
  
   Then we make virt-manager optional and this is good because we can sync
  features way faster and we don't have to care about _managing_ several
  VMs, our world in terms of usability and maintainability is about one VM.
  
   IMVHO, everything else should be done by third-party tools like libvirt,
  we just provide the means for it.

  
  We need to have a common management interface for third party tools.  
  libvirt cannot be that today because of the fact that it doesn't support 
  all of our features.  What we need to figure out is how we can work with 
  the libvirt team to fix this.
 
 The feature problem isn't the only one. It's also about ease of use. I 
 personally find the qemu command line easier to use than anything 
 libvirt-derived.

 Because your a developer and it does make sense to have a good CLI,
on the other hand we also have use cases for a GUI bundled in QEMU
and libvirt-derived things, which know how to deal with several
VMs and integrates well with lots of other things.




Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Anthony Liguori

On 03/24/2010 04:25 PM, Luiz Capitulino wrote:


  I see it as a related problem, because what seems to be under discussion
is the quality of our interfaces with humans and tools.

  Also, when we were discussing the usuability problems I remember that
you

*WARNING: I might be wrong here, please correct me if so*

  you said that you don't push users to libvirt because it's out of sync with
our features.


Yes.


  The point is that, even if this true and even if we solve that,
I don't think it will solve the problem of a good experience for a
'single VM user', because libvirt is more than that and people will likely
be annoyed as much as they are today.

  I believe this problem is up to us to solve.
   


With my qemu hat on, I'm happy to ignore libvirt and say we need to own 
our interfaces and to compete with libvirt for users.


But with my Linux virtualization hat on, I want to see a single 
management interface that users can use without having to make a choice 
between libvirt features or libqemu features.



   Then we make virt-manager optional and this is good because we can sync
features way faster and we don't have to care about _managing_ several
VMs, our world in terms of usability and maintainability is about one VM.

   IMVHO, everything else should be done by third-party tools like libvirt,
we just provide the means for it.

   

We need to have a common management interface for third party tools.
 

  QMP? :-)


Only if QMP is compatible with libvirt.  I don't want a user to have to 
choose between QMP and libvirt.

So far, a libqemu.so with a flexible transport that could be used
directly by a libvirt user (ala cairo/gdk type interactions) seems like
the best solution to me.
 

  I tend to disagree.

  First, I think we should invest our time and effort on the text protocol
business, which is QMP. Having yet another public interface will likely split
efforts a bit and will make clients' life harder (which one should I choose?
What if they get out of sync?). Not to mention that I think Paul has a point,
if QMP is not useful here, why do we have it in the first place (vs. a C library
from the beginning)?

  You mentioned dynamic dispatch, but this is useful only for C clients right?
If so, what C clients you expected beyond libvirt?


Users want a C API.  I don't agree that libvirt is the only C interface 
consumer out there.



  Note that libvirt has added
a new events API recently.

  The second most important point for me is: why do you believe that
libqemu.so is going to improve things? Do you expect that libvirt will
sync faster?


With GDK and Cairo, when Cairo adds a new feature, GDK doesn't have to 
do anything to support it.  Users just get a cairo context from GDK and 
use the cairo API directly.


GDK provides a higher level interface for 2d operations that is more 
platform agnostic, and users can choice to use that or write directly to 
the cairo API.



  If this is the case, I think it will be as slower as it's
currently, as the problem is not the availability of interfaces, but
most likely community integration.

  I like the idea of having a transient qemu-specific API in libvirt,
as suggested by someone in this thread.
   


I really think what we want is for a libvirt user to be able to call 
libqemu functions directly.  There shouldn't have to be libvirt specific 
functions for every operation we expose.


Regards,

Anthony Liguori





Re: [Qemu-devel] TBL register permissions for PPC

2010-03-24 Thread Dmitry Ilyevsky
Hello All,

Please review patch for TBL SPR read access for generic PPC.

*Description:*

POWER specification docs define TBL/TBU SPRs as readable in user
and privileged modes. Therefore SPRs permissions were changed in gen_tbl
function in target-ppc/translate_init.c file.

*Testing:*

Tested with vxworks-6.2 bsp and OS on custom qemu board that includes ppc405
emulated core


BR,
Dmitry Ilyevsky

On Wed, Dec 2, 2009 at 2:23 AM, Alexander Graf ag...@suse.de wrote:


 On 01.12.2009, at 19:33, Dima Ilyevsky wrote:

  Hello All,
 
  I have a question about read permissions of TBL SPR for all ppc
 processors:
  I have discovered that my application, compiled by WindRiver diab
 compiler and running in vxworks OS on ppc405 architecture bumps into
 exception generated when trying to read TBL or TBU registers:

 Unless Linux does something funky, mftlb, mftbu (and mftb on 64 bit) are
 readable from PR=1.

 int main()
 {
long tbu=0, tbl=0;

asm(mftbu %0 : =r (tbu));
asm(mftbl %0 : =r (tbl));

printf(TB: %#x %#x\n, tbl, tbu);
 }

 ag...@lychee:/tmp ./mftb
 TB: 0xc0397180 0x603

 However it can't be written to:

 asm(mttbl %0 : : r (tbl));

 ag...@lychee:/tmp ./mftb
 Illegal instruction


 So yes, I'd suspect a bug in qemu here. Feel free to send a patch.

 Alex

From 141bf29f5355f163205c57e98590730ed15bfb86 Mon Sep 17 00:00:00 2001
From: n/a inst...@ubuntu-desktop.(none)
Date: Thu, 25 Mar 2010 00:22:25 +0300
Subject: [PATCH] Generic PowerPC time base SPR should be accessible in user/priv modes for reading

---
 target-ppc/translate_init.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index db4dc17..e8eadf4 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -777,16 +777,16 @@ static void gen_tbl (CPUPPCState *env)
  spr_read_tbl, SPR_NOACCESS,
  0x);
 spr_register(env, SPR_TBL,   TBL,
- SPR_NOACCESS, SPR_NOACCESS,
- SPR_NOACCESS, spr_write_tbl,
+ spr_read_tbl, SPR_NOACCESS,
+ spr_read_tbl, spr_write_tbl,
  0x);
 spr_register(env, SPR_VTBU,  TBU,
  spr_read_tbu, SPR_NOACCESS,
  spr_read_tbu, SPR_NOACCESS,
  0x);
 spr_register(env, SPR_TBU,   TBU,
- SPR_NOACCESS, SPR_NOACCESS,
- SPR_NOACCESS, spr_write_tbu,
+ spr_read_tbu, SPR_NOACCESS,
+ spr_read_tbu, spr_write_tbu,
  0x);
 }
 
-- 
1.7.0



[Qemu-devel] Re: [PATCH v2] update bochs vbe interface

2010-03-24 Thread Juan Quintela
Gerd Hoffmann kra...@redhat.com wrote:
 The bochs vbe interface got a new register a while back, which specifies
 the linear framebuffer size in 64k units.  This patch adds support for
 the new register to qemu.  With this patch applied vgabios 0.6c works
 with qemu.

 [ v2:  Don't savevm the new register.  Doing so breaks migration,
and as it carries read-only information for the guest there
is no need to save it. ]


It don't compile (as expected).  VMSTATE_UINT16_ARRAY() checks that you
sent the whole array.


/mnt/kvm/qemu/qemu-negotiate/hw/vga.c:2219: error: invalid operands to binary - 
(have ‘uint16_t (*)[10]’ and ‘uint16_t (*)[11]’)
make[1]: *** [vga.o] Error 1
make[1]: *** Waiting for unfinished jobs
^Cmake[1]: *** [translate.o] Interrupt
make[1]: *** [op_helper.o] Interrupt
make: *** [subdir-x86_64-softmmu] Interrupt

 ---
  hw/vga.c |5 +++--
  hw/vga_int.h |6 +-
  2 files changed, 8 insertions(+), 3 deletions(-)

 diff --git a/hw/vga.c b/hw/vga.c
 index 6a1a059..f9e07cf 100644
 --- a/hw/vga.c
 +++ b/hw/vga.c
 @@ -1955,7 +1955,8 @@ void vga_common_reset(VGACommonState *s)
  #ifdef CONFIG_BOCHS_VBE
  s-vbe_index = 0;
  memset(s-vbe_regs, '\0', sizeof(s-vbe_regs));
 -s-vbe_regs[VBE_DISPI_INDEX_ID] = VBE_DISPI_ID0;
 +s-vbe_regs[VBE_DISPI_INDEX_ID] = VBE_DISPI_ID5;

Now, to show my ignorance, what does this change means?

I can't understand it looking at the whole file (but I don't understand
vga.c too well anyways).

Later, Juan.




[Qemu-devel] Re: [PATCH] update bochs vbe interface

2010-03-24 Thread Juan Quintela
Gerd Hoffmann kra...@redhat.com wrote:
 On 03/24/10 18:04, Juan Quintela wrote:
 Gerd Hoffmannkra...@redhat.com  wrote:
 The bochs vbe interface got a new register a while back, which specifies
 the linear framebuffer size in 64k units.  This patch adds support for
 the new register to qemu.  With this patch applied vgabios 0.6c works
 with qemu.

 Signed-off-by: Gerd Hoffmannkra...@redhat.com

 It breaks migration.

 vga.c:2218:VMSTATE_UINT16_ARRAY(vbe_regs, VGACommonState, 
 VBE_DISPI_INDEX_NB),

 2218 is the interesting line.  Can we freeze this patch until the
 subsections stuff is done?

 Yea, I see.  Well, the new register doesn't carry any state, it is
 read-only information for the guest.  So the easy way out is to simply
 not save it as we don't have to.

Thinking a bit more about it.

Humm, I think it means.  Can you migrate from a new vga to an old one,
and maintain it working?

-s-vbe_regs[VBE_DISPI_INDEX_ID] = VBE_DISPI_ID0;
+s-vbe_regs[VBE_DISPI_INDEX_ID] = VBE_DISPI_ID5;
+s-vbe_regs[VBE_DISPI_INDEX_VIDEO_MEMORY_64K] = s-vram_size / (64 * 1024);

After migration, vbe_regs[VBE_DISPI_INDEX_ID] would have the value
VBE_DISPI_ID5, but vbe_regs[VBE_DISPI_INDEX_VIDEO_MEMORY_64K] will have
any random value, no?


 cheers,
   Gerd




Re: [Qemu-devel] Re: Compile files only once: some planning

2010-03-24 Thread Paul Brook
 But now there is a bigger problem, how to pass the property to the
 device. It's not fair to require the user to remember to set it.

It should not be a property of the device. All devices have a native 
endianness (for PCI this is little-endian), and the intermediate 
busses/interconnects should determine whether byteswapping occurs.

Paul




[Qemu-devel] Significant performance regression in qemu-system-mips.

2010-03-24 Thread Rob Landley
I have a native build under qemu that gets killed if it doesn't produce a line 
of output for 60 seconds (hang detection enforced by the host monitoring 
qemu's stdout with --nographic, not from within qemu).

In the most recent release version, it never came close to triggering on mips 
with a 30 second timeout.  In the current -git version (well, as of Thursday 
anyway), it triggers frequently (about 90% of the time) even with a 60 second 
timeout.

I bisected it to this:

commit 1828be316f6637d43dd4c4f5f32925b17fb8107f
Author: Paolo Bonzini pbonz...@redhat.com
Date:   Wed Mar 10 11:38:41 2010 +0100

more alarm timer cleanup

The timer_alarm_pending variable is related to the alarm timer but not
placed in the struct.  Also, in qemu_mod_timer the wrong flag was being
tested: the timer is rearmed in the alarm timer bottom half, so the
right flag to test there is the pending flag.

Finally, I hoisted the NULL checks from alarm_has_dynticks to
host_alarm_handler.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Anthony Liguori aligu...@us.ibm.com

Reverting that patch fixed it (git show HEAD | patch -R -p1), by which I mean 
three consecutive runs with 30 second timeout didn't trigger the hang 
detection.

Unfortunately, I can't revert that patch in current origin/master because most 
of the hunks fail...

Help?

Rob
-- 
Latency is more important than throughput. It's that simple. - Linus Torvalds




  1   2   >