Re: [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/24/10 00:13, Jamie Lokier wrote: Gerd Hoffmann wrote: - networking: man, setting networking is a mess, libvirt just does it for you. +1 Even when not using libvirt for a reason or another I usually hook my virtual machines into virbr0 (libvirt default network). I had the opposite problem. Needed to use multiple bridges and have some VMs behind NAT without a bridge (private IPs), and some using separately firewalled bridges (needed to behave like real attached hardware with their original MACs, but be firewalled). No problem in theory. libvirt should detect existing bridges and allow you to attach virtual machines to them. So you can setup bridges and firewalling for them using usual distro tools and use them for virtual machines. In practice I've seen this not working correctly in the past, i.e. my br0 didn't pop up in the virt-manager nic setup page. cheers, Gerd
[Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
Andi Kleen a...@firstfloor.org wrote: Juan Quintela quint...@redhat.com writes: - networking: man, setting networking is a mess, libvirt just does it for you. Agreed it's messy, but isn't this something that the standard qemu command line tool could potentially do better by itself? I don't see why you need a wrapper for that. In my case, basically it is MAC addresses. I have dhcp setup, and it always give the same IP to the same MAC. But you have to remember to type the MAC addresses. This is the typical command line that virsh start launch for me: /usr/libexec/qemu-kvm -S -M pc-0.12 -enable-kvm -m 1024 -smp 2,sockets=2,cores=1,threads=1 -name f12X-64 -uuid 1fbe73a6-f519-e848-03bd-6636f765d143 -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/f12X-64.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -boot c -drive file=/mnt/kvm/images/f12X-64.img,if=none,id=drive-virtio-disk0,boot=on,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -device virtio-net-pci,vlan=0,id=net0,mac=54:52:00:44:72:e6,bus=pci.0,addr=0x5 -net tap,fd=18,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -k es -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 From parts: /usr/libexec/qemu-kvm -S I don't want that -M pc-0.12 I don't care. -enable-kvm I _want_ :) -m 1024 Also god idea -smp 2,sockets=2,cores=1,threads=1 by hand it is always -smp 2 -name f12X-64 -uuid 1fbe73a6-f519-e848-03bd-6636f765d143 don't care -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/f12X-64.monitor,server,nowait -mon chardev=monitor,mode=readline this is simplified as: -monitor stdio when I launch it by hand. -rtc base=utc -boot c don't care -drive file=/mnt/kvm/images/f12X-64.img,if=none,id=drive-virtio-disk0,boot=on,cache=none -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 this is _wow_, I only want to put the disk image path and convince it to use virtio driver -device virtio-net-pci,vlan=0,id=net0,mac=54:52:00:44:72:e6,bus=pci.0,addr=0x5 -net tap,fd=18,vlan=0,name=hostnet0 this always have to be changed. s/fd=18/script=/etc/kvm-ifup/ and then I normally found that I want downscript= to avoid the warning at exit time. If I don't put a mac address, qemu command line works well, but as I normally also use vnc I have to: - launch qemu - kill it, relaunch with -vnc :0 instead of -vnc 127.0.0.1:0 - re-launch qemu - connect to vnc - check what address the dhcp server was giving to it this time - I can ssh to the client now with libvirt handling the command line, I just ssh to the same dhcp address that was given the previous time/day/... -chardev pty,id=serial0 -device isa-serial,chardev=serial0 I only use serial from time to time, and using -serial tcp:0,server,nowait (or whatever is the sintax is easier by hand) -usb -device usb-tablet,id=input0 usb tablet is mandatory, just in case the guest is able to _not_ grab the mouse. -vnc 127.0.0.1:0 Allways wrong in my case, because I want to run the vnc client in a different machine. a way to convince virt-viewer to connect to a qemu launched by hand, or a way to convince libvirt to let me edit the command line will be great. -k es -vga cirrus this get right by default. -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 I normally don't use balloon. Notice for the normally I don't care bits, that at the end I always care. Why? because then somebody arrives and told me that sound don't work, and I have to edit the config file, and add sound option. add a sound option to the command line of qemu is not too complicate. The other big problem for me are snapshots, I have to remember _exactly_ what was the qemu command line with which I saved the snapshot. Guess what, I normally don't remember and end: - launching old qemu - save a new snapshot - test with the new qemu and new snapshot (because now I have the command line that I launched 5 mins before). Just in case it helps. Later, Juan.
[Qemu-devel] Re: Compile files only once: some planning
Blue Swirl blauwir...@gmail.com wrote: Hi, Here's some planning for getting most files compiled as few times as possible. Comments and suggestions are welcome. I took some thought about this at some point. Problems here start from Recursive Makefile condered Harmful (tm). Look at how we jump through hops to be able to compile things in one/other side. We have: Makefile Makefile.target (really lots of them, one for target) Makefile.hw Makefile.user If we had only a single Makefile, things in this department would be much, much easier. And no, convert to a single Makefile is not trivial either, but it would make things easier. Why do we have several Makefiles? Because we want to compile each file with different options. Why do we need to abuse so much VPATH? Because we need to bring files randomly from $(ROOT), $(ROOT)/hw $(ROOT)/$(TARGET). Problem here, there isn't a simple way to compile files for several target just once (no way to put them). Our main copmile rule is: $(QEMU_PROG): $(obj-y) $(obj-$(TARGET_BASE_ARCH)-y) $(call LINK,$(obj-y) $(obj-$(TARGET_BASE_ARCH)-y)) (notice that things compiled in Makefile are trivial, they are already compiled just once by definition, problems are for all the qemu's we compile). We could change: $(obj-$(TARGET_BASE_ARCH)-y) to something like: OBJ-TARGET=s/.o/.$(TARGET_BASE_ARCH).o/ (I forgot the subst Makefile syntax), and have the: %.$(TARGET_BASE_ARCH).o: %.c gcc $(TARGET_BASE_ARCH options) From there, as you suggested, we need some files that are not compiled by architecture, they need to be compiled by board, well, we need to add yet another level obj-$(TARGET_BOARD) or whatever. Notice that this is a lot of work, but you are needing the audit to be able to compile only once. Problem just now is that there is not a simple way to describe that information, with my proposal it gets trivial to express: obj-$(CONFIG_FOO) += foo.o # You need this for everything obj-mips-$(CONFIG_FOO) += foo.o # You need this for all mips boards obj-malta-$(CONFIG_FOO) += foo.o # You need this for all mips malta board You still need to do some different magic from hw-32/64 but it could be done this way. Once you did it this way, you now where the files are (hw or target) and you can drop the VPATH tricks. Problem with this proposal is that it is not trivial to do in little steps, and the real big advantages appear when you switch to a single Makefile at the end. vl.c: a lot of work. Maybe the CPUState stuff should be separated to a new file. That should just be a rule in Documentation. You can't but anything else in vl.c. If you move anything out of vl.c (see timers work from bonzini for example), you get a wild card for free commit bypassing maintainers or some similar price :) rest of files I haven't really looked at them at depth. I looked when I cleaned up the build system, I thought how to do the next step (outlined before), but got sidetracked by other more urgent things. Later, Juan.
[Qemu-devel] Re: Exposing monitor on socket interface?
Jun Koi junkoi2...@gmail.com wrote: Hi, Is it possible to use -monitor option to expose the monitor on socket interface, such as TCP or Unix domain port, so I can access the monitor using non-stdio way? man qemu search -monitor -monitor dev Redirect the monitor to host device dev (same devices as the serial port). The default device is vc in graphical mode and stdio in non graphical mode. search -serial -serial dev Redirect the virtual serial port to host character device dev. The default device is vc in graphical mode and stdio in non graphical mode. This option can be used several times to simulate up to 4 serial ports. Use -serial none to disable all serial ports. Available character devices are: tcp:[host]:port[,server][,nowait][,nodelay] The TCP Net Console has two modes of operation. It can send the serial I/O to a location or wait for a connection from a location. By default the TCP Net Console is sent to host at the port. If you use the server option QEMU will wait for a client socket application to connect to the port before continuing, unless the nowait option was specified. The nodelay option disables the Nagle buffering algorithm. If host is omitted, 0.0.0.0 is assumed. Only one TCP connection at a time is accepted. You can use telnet to connect to the corresponding character device. Example to send tcp console to 192.168.0.2 port -serial tcp:192.168.0.2: Example to listen and wait on port for connection -serial tcp::,server Example to not wait and listen on ip 192.168.0.100 port -serial tcp:192.168.0.100:,server,nowait telnet:host:port[,server][,nowait][,nodelay] The telnet protocol is used instead of raw tcp sockets. The options work the same as if you had specified -serial tcp. The difference is that the port acts like a telnet server or client using telnet option negotiation. This will also allow you to send the MAGIC_SYSRQ sequence if you use a telnet that supports sending the break sequence. Typically in unix telnet you do it with Control-] and then type send break followed by pressing the enter key. I think that it is difficult to get more options that qemu in that department :-) Later, Juan.
[Qemu-devel] Re: Compile files only once: some planning
The harder cases are those where the device code depends somehow on the architecture. Some thoughts follow. vl.c: a lot of work. Maybe the CPUState stuff should be separated to a new file. dma.c: DMA_schedule needs access to CPUState. Most users of CPUState (e.g. qemu-timer.c and hw/dma.c) either need it as an opaque pointer, or only need access to target-independent stuff. So you could: 1) make CPUState define only common fields. Include CPUState at the beginning of each per-target CPUXYZState. 2) Do s/CPUState/CPUXYZState/ on target-*/*. 3) Make it compile, possibly by undoing parts of 2) and changing parts of it to DO_UPCAST. Paolo
[Qemu-devel] Re: [RFC] vhost-blk implementation
On Tue, Mar 23, 2010 at 12:55:07PM -0700, Badari Pulavarty wrote: Michael S. Tsirkin wrote: On Tue, Mar 23, 2010 at 10:57:33AM -0700, Badari Pulavarty wrote: Michael S. Tsirkin wrote: On Mon, Mar 22, 2010 at 05:34:04PM -0700, Badari Pulavarty wrote: Write Results: == I see degraded IO performance when doing sequential IO write tests with vhost-blk compared to virtio-blk. # time dd of=/dev/vda if=/dev/zero bs=2M oflag=direct I get ~110MB/sec with virtio-blk, but I get only ~60MB/sec with vhost-blk. Wondering why ? Try to look and number of interrupts and/or number of exits. I checked interrupts and IO exits - there is no major noticeable difference between vhost-blk and virtio-blk scenerios. It could also be that you are overrunning some queue. I don't see any exit mitigation strategy in your patch: when there are already lots of requests in a queue, it's usually a good idea to disable notifications and poll the queue as requests complete. That could help performance. Do you mean poll eventfd for new requests instead of waiting for new notifications ? Where do you do that in vhost-net code ? vhost_disable_notify does this. Unlike network socket, since we are dealing with a file, there is no -poll support for it. So I can't poll for the data. And also, Issue I am having is on the write() side. Not sure I understand. I looked at it some more - I see 512K write requests on the virtio-queue in both vhost-blk and virtio-blk cases. Both qemu or vhost is doing synchronous writes to page cache (there is no write batching in qemu that is affecting this case). I still puzzled on why virtio-blk outperforms vhost-blk. Thanks, Badari If you say the number of requests is the same, we are left with: - requests are smaller for some reason? - something is causing retries? No. IO requests sizes are exactly same (512K) in both cases. There are no retries or errors in both cases. One thing I am not clear is - for some reason guest kernel could push more data into virtio-ring in case of virtio-blk vs vhost-blk. Is this possible ? Does guest gets to run much sooner in virtio-blk case than vhost-blk ? Sorry, if its dumb question - I don't understand all the vhost details :( Thanks, Badari You said you observed same number of requests in userspace versus kernel above. And request size is the same as well. But somehow more data is transferred? I'm confused. -- MST
[Qemu-devel] Re: Exposing monitor on socket interface?
Thanks a lot, Juan! Jun On Wed, Mar 24, 2010 at 6:41 PM, Juan Quintela quint...@redhat.com wrote: Jun Koi junkoi2...@gmail.com wrote: Hi, Is it possible to use -monitor option to expose the monitor on socket interface, such as TCP or Unix domain port, so I can access the monitor using non-stdio way? man qemu search -monitor -monitor dev Redirect the monitor to host device dev (same devices as the serial port). The default device is vc in graphical mode and stdio in non graphical mode. search -serial -serial dev Redirect the virtual serial port to host character device dev. The default device is vc in graphical mode and stdio in non graphical mode. This option can be used several times to simulate up to 4 serial ports. Use -serial none to disable all serial ports. Available character devices are: tcp:[host]:port[,server][,nowait][,nodelay] The TCP Net Console has two modes of operation. It can send the serial I/O to a location or wait for a connection from a location. By default the TCP Net Console is sent to host at the port. If you use the server option QEMU will wait for a client socket application to connect to the port before continuing, unless the nowait option was specified. The nodelay option disables the Nagle buffering algorithm. If host is omitted, 0.0.0.0 is assumed. Only one TCP connection at a time is accepted. You can use telnet to connect to the corresponding character device. Example to send tcp console to 192.168.0.2 port -serial tcp:192.168.0.2: Example to listen and wait on port for connection -serial tcp::,server Example to not wait and listen on ip 192.168.0.100 port -serial tcp:192.168.0.100:,server,nowait telnet:host:port[,server][,nowait][,nodelay] The telnet protocol is used instead of raw tcp sockets. The options work the same as if you had specified -serial tcp. The difference is that the port acts like a telnet server or client using telnet option negotiation. This will also allow you to send the MAGIC_SYSRQ sequence if you use a telnet that supports sending the break sequence. Typically in unix telnet you do it with Control-] and then type send break followed by pressing the enter key. I think that it is difficult to get more options that qemu in that department :-) Later, Juan.
[Qemu-devel] Re: Completing big real mode emulation
On Saturday 20 March 2010 23:00:49 Alexander Graf wrote: Am 20.03.2010 um 15:02 schrieb Mohammed Gamal m.gamal...@gmail.com: On Sat, Mar 20, 2010 at 3:18 PM, Avi Kivity a...@redhat.com wrote: On 03/20/2010 10:55 AM, Alexander Graf wrote: I'd say that a GSoC project would rather focus on making a guest OS work than working on generic big real mode. Having Windows 98 support is way more visible to the users. And hopefully more fun to implement too, as it's a visible goal :-). Big real mode allows you to boot various OSes, such as that old Ubuntu/SuSE boot loader which triggered the whole thing. I thought legacy Windows uses it too? IIRC even current Windows (last I checked was XP, but it's probably true for newer) invokes big real mode inadvertently. All it takes is not to clear fs and gs while switching to real mode. It works because the real mode code never uses gs and fs (i.e. while we are technically in big real mode, the guest never relies on this), and because there are enough hacks in vmx.c to make it work (restoring fs and gs after the switch back). IIRC there are other cases of invalid guest state that we hack into place during mode switches. Either way - then we should make the goal of the project to support those old boot loaders. IMHO it should contain visibility. Doing theoretical stuff is just less fun for all parties. Or does that stuff work already? Mostly those old guests aged beyond usefulness. They are still broken, but nobody installs new images. Old images installed via workarounds work. Goals for this task could include: - get those older guests working - get emulate_invalid_guest_state=1 to work on all supported guests - switch to emulate_invalid_guest_state=1 as the default - drop the code supporting emulate_invalid_guest_state=0 eventually To this end I guess the next logical step is to compile a list of guests that are currently not working/work with hacks only, and get them working. Here are some suggestions: - MINIX 3.1.6 (developers have been recently filing bug reports because of boot failures) - Win XP with emulation enabled - FreeDOS with memory extenders Any other guests you'd like to see on this list? I remember old openSUSE iso bootloaders had issues. I think it was around 10.3, but might have been earlier. At least 10u2 installer has trouble. I had spent some time on it, finally found it's due to ISOLINUX. The basic issue is it assume that SS selector/base is unchanged when enter/exit protect mode. At that time, I've cooked a hack workaround for it, but didn't think it's proper to upstream. -- regards Yang, Sheng
[Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
On Mon, Mar 22, 2010 at 04:49:21PM -0500, Anthony Liguori wrote: On 03/22/2010 03:10 PM, Daniel P. Berrange wrote: This isn't necessarily libvirt's problem if it's mission is to provide a common hypervisor API that covers the most commonly used features. That is more or less our current mission. If this mission leads to QEMU creating a non-libvirt based API telling people to use that instead, then I'd say libvirt's mission needs to change to avoid that scenario ! I strongly believe that libvirt's strategy is good for application developers over the medium to long term. We need to figure out how to get rid of the short term pain from the feature timelag, rather than inventing a new library API for them to use. Well that's certainly a good thing :-) However, for qemu, we need an API that covers all of our features that people can develop against. The ultimate question we need to figure out is, should we encourage our users to always use libvirt or should we build our own API for people (and libvirt) to consume. I don't think it's necessarily a big technical challenge for libvirt to support qemu more completely. I think it amounts to introducing a series of virQemu APIs that implement qemu specific functions. Over time, qemu specific APIs can be deprecated in favour of more generic virDomain APIs. Stepping back a bit first, there are the two core areas in which people can be limited by libvirt currently. 1. Monitor commands 2. Command line flags Ultimately, IIUC, you are suggesting we need to allow arbitrary passthrough for both of these in libvirt. At the libvirt level, we have 3 core requirements 1. The XML format is extend only (new elements allowed, or add attributes or children to existing elements) 2. The C library API is append only (new symbols only) 3. The RPC wire protocol is append only (maps 1-1 to the C API generally) We have a slightly different mentality within QEMU I think. Here's roughly how I'd characterize our guarantees. 1. For any two versions of QEMU, we try to guarantee that the same VM, as far as the guest sees it, can be created. 2. We tend to avoid changing command line syntax unless the syntax was previously undefined. 3. QMP supports enumeration and feature negotiation. This enables a client to discover which functions are supported. 4. We try to maintain monitor interfaces but provide no guarantees of compatibility. Points 2 4 make it very hard for libvirt to use any library API that QEMU might expose. We need to support multiple concurrently running versions of QEMU on a host, to cope with the package upgrade scenario adhoc testing of new versions. If a libqemu.so for talking to QEMU changed a monitor interface didn't have backwards compatability for older QEMU version, then it is not something we could use, because any particular libvirt build would be tied to only being able to talk to the specific QEMU version. Currently we internally deal with changes in syntax detecting which format/protocol we need to use at runtime and need to maintain that ability. Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
On Wed, Mar 24, 2010 at 07:17:26AM +0200, Avi Kivity wrote: On 03/23/2010 08:00 PM, Avi Kivity wrote: On 03/23/2010 06:06 PM, Anthony Liguori wrote: I thought the monitor protocol *was* our API. If not, why not? It is. But our API is missing key components like guest enumeration. So the fundamental topic here is, do we introduce these missing components to allow people to build directly to our interface or do we make use of the functionality that libvirt already provides if they can plumb our API directly to users. Guest enumeration is another API. Over the kvm call I suggested a qemu concentrator that would keep track of all running qemus, and would hand out monitor connections to users. It can do the enumeration (likely using qmp). Libvirt could talk to that, like it does with other hypervisors. To elaborate qemud - daemonaizes itself - listens on /var/lib/qemud/guests for incoming guest connections - listens on /var/lib/qemud/clients for incoming client connections - filters access according to uid (SCM_CREDENTIALS) - can pass a new monitor to client (SCM_RIGHTS) - supports 'list' command to query running guests - async messages on guest startup/exit My concern is that once you provide this, then next someone wants it to list inactive guests too. Once you list inactive guests, then you'll want this to start a guest. Once you start guests then you want cgroups integration, selinux labelling so on, until it ends up replicating all of libvirt's QEMU functionality. To be able to use the list functionality from libvirt, we need this daemon to also guarentee id, name uuid uniqueness for all VMs, both running and inactive, with separate namespaces for the system vs per-user lists. Or we have to ignore any instances listed by qemud that were not started by libvirt, which rather defeats the purpose. The filtering access part of this daemon is also not mapping well onto libvirt's access model, because we don't soley filter based on UID in libvirtd. We have it configurable based on UID, policykit, SASL, TLS/x509 already, and intend adding role based access control to further filter things, integrating with the existing apparmour/selinux security models. A qemud that filters based on UID only, gives users a side-channel to get around libvirt's access control. Daniel -- |: Red Hat, Engineering, London-o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org-o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
On 03/24/2010 12:36 PM, Daniel P. Berrange wrote: On Wed, Mar 24, 2010 at 07:17:26AM +0200, Avi Kivity wrote: On 03/23/2010 08:00 PM, Avi Kivity wrote: On 03/23/2010 06:06 PM, Anthony Liguori wrote: I thought the monitor protocol *was* our API. If not, why not? It is. But our API is missing key components like guest enumeration. So the fundamental topic here is, do we introduce these missing components to allow people to build directly to our interface or do we make use of the functionality that libvirt already provides if they can plumb our API directly to users. Guest enumeration is another API. Over the kvm call I suggested a qemu concentrator that would keep track of all running qemus, and would hand out monitor connections to users. It can do the enumeration (likely using qmp). Libvirt could talk to that, like it does with other hypervisors. To elaborate qemud - daemonaizes itself - listens on /var/lib/qemud/guests for incoming guest connections - listens on /var/lib/qemud/clients for incoming client connections - filters access according to uid (SCM_CREDENTIALS) - can pass a new monitor to client (SCM_RIGHTS) - supports 'list' command to query running guests - async messages on guest startup/exit My concern is that once you provide this, then next someone wants it to list inactive guests too. That's impossible, since qemud doesn't manage config files or disk images. It can't even launch guests! Once you list inactive guests, then you'll want this to start a guest. Once you start guests then you want cgroups integration, selinux labelling so on, until it ends up replicating all of libvirt's QEMU functionality. To be able to use the list functionality from libvirt, we need this daemon to also guarentee id, name uuid uniqueness for all VMs, both running and inactive, with separate namespaces for the system vs per-user lists. Or we have to ignore any instances listed by qemud that were not started by libvirt, which rather defeats the purpose. qemud won't guarantee name uniqueness or provide uuids. The filtering access part of this daemon is also not mapping well onto libvirt's access model, because we don't soley filter based on UID in libvirtd. We have it configurable based on UID, policykit, SASL, TLS/x509 already, and intend adding role based access control to further filter things, integrating with the existing apparmour/selinux security models. A qemud that filters based on UID only, gives users a side-channel to get around libvirt's access control. That's true. Any time you write a multiplexer these issues crop up. Much better to stay in single process land where everything is already taken care of. So, at best qemud is a toy for people who are annoyed by libvirt. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: Compile files only once: some planning
On 03/24/2010 02:47 AM, Paolo Bonzini wrote: 1) make CPUState define only common fields. Include CPUState at the beginning of each per-target CPUXYZState. Irritatingly, the common fields contain quite big TLBs. And the offsets from the start of env affect the compactness of the code generated from TCG. We really really want the general registers to come first to make sure that those offsets fit the host's reg+offset addressing mode. r~
[Qemu-devel] Guest memory mapping in Qemu
Hello, This is an idle question in the sense that, much as I would like to, I know for a fact that I won't have the time to look at implementing this. I'm not expecting other people to seriously look at doing it either, but I would be interested on your thoughts. If the technical documentation at http://www.usenix.org/publications/library/proceedings/usenix05/tech/freenix/full_papers/bellard/bellard_html/index.html is still valid (I think it is), Qemu has two modes of handling access to guest memory - system emulation, in which an entire guest address space is mapped on the host, and emulated MMU. I was wondering whether something in-between would also be feasible. That is, chunks of guest address space (say 4MB chunks for the sake of the argument) are mmapped into the address space of the Qemu process on the host, and when an access to guest memory is made, there is an initial check to see whether it is in the same chunk as the last one, in which case all the MMU emulation bits could be saved. I could imagine Qemu keeping a current/most recent chunk for each register which can be used for relative addressing, plus one for non-register-relative accesses. It seems to me that this could potentially speed up memory access quite a bit, and as a bonus even make it easy to support x86 segmentation (as part of the bounds check for whether a memory access is in a chunk). I realise of course that I have glibly glossed over all the nasty bits - off the top of my head keeping track of all the mapped chunks in the host address space, lookups to see if an access outside of the current chunk is inside another mapped one, invalidating chunks when guest page tables they are based on change. I am sure that there are many more issues... Looking forward to reading any responses. Regards, Michael _ Hotmail: Trusted email with Microsoft’s powerful SPAM protection. https://signup.live.com/signup.aspx?id=60969
[Qemu-devel] Re: [RFC] vhost-blk implementation
On Tue, Mar 23, 2010 at 12:55:07PM -0700, Badari Pulavarty wrote: Michael S. Tsirkin wrote: On Tue, Mar 23, 2010 at 10:57:33AM -0700, Badari Pulavarty wrote: Michael S. Tsirkin wrote: On Mon, Mar 22, 2010 at 05:34:04PM -0700, Badari Pulavarty wrote: Write Results: == I see degraded IO performance when doing sequential IO write tests with vhost-blk compared to virtio-blk. # time dd of=/dev/vda if=/dev/zero bs=2M oflag=direct I get ~110MB/sec with virtio-blk, but I get only ~60MB/sec with vhost-blk. Wondering why ? Try to look and number of interrupts and/or number of exits. I checked interrupts and IO exits - there is no major noticeable difference between vhost-blk and virtio-blk scenerios. It could also be that you are overrunning some queue. I don't see any exit mitigation strategy in your patch: when there are already lots of requests in a queue, it's usually a good idea to disable notifications and poll the queue as requests complete. That could help performance. Do you mean poll eventfd for new requests instead of waiting for new notifications ? Where do you do that in vhost-net code ? vhost_disable_notify does this. Unlike network socket, since we are dealing with a file, there is no -poll support for it. So I can't poll for the data. And also, Issue I am having is on the write() side. Not sure I understand. I looked at it some more - I see 512K write requests on the virtio-queue in both vhost-blk and virtio-blk cases. Both qemu or vhost is doing synchronous writes to page cache (there is no write batching in qemu that is affecting this case). I still puzzled on why virtio-blk outperforms vhost-blk. Thanks, Badari If you say the number of requests is the same, we are left with: - requests are smaller for some reason? - something is causing retries? No. IO requests sizes are exactly same (512K) in both cases. There are no retries or errors in both cases. One thing I am not clear is - for some reason guest kernel could push more data into virtio-ring in case of virtio-blk vs vhost-blk. Is this possible ? Does guest gets to run much sooner in virtio-blk case than vhost-blk ? Sorry, if its dumb question - I don't understand all the vhost details :( Thanks, Badari BTW, did you put the backend in non-blocking mode? As I said, vhost net passes non-blocking flag to socket backend, but vfs_write/read that you use does not have an option to do this. So we'll need to extend the backend to fix that, but just for demo purposes, you could set non-blocking mode on the file from userspace. -- MST
[Qemu-devel] [PATCH][RESEND] qemu: jaso-parser: Output the content of invalid keyword
When input some invialid words in QMP port, qemu outputs this error message: parse error: invalid keyword `%s' This patch makes qemu output the content, like: parse error: invalid keyword `unknow_cmd' Signed-off-by: Amos Kong ak...@redhat.com --- json-parser.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/json-parser.c b/json-parser.c index 579928f..98a82af 100644 --- a/json-parser.c +++ b/json-parser.c @@ -12,6 +12,7 @@ */ #include stdbool.h +#include stdarg.h #include qemu-common.h #include qstring.h @@ -93,7 +94,11 @@ static int token_is_escape(QObject *obj, const char *value) */ static void parse_error(JSONParserContext *ctxt, QObject *token, const char *msg, ...) { -fprintf(stderr, parse error: %s\n, msg); +va_list ap; +va_start(ap, msg); +fprintf(stderr, parse error: ); +vfprintf(stderr, msg, ap); +fprintf(stderr, \n); } /** -- 1.5.5.6
Re: [Qemu-devel] [PATCH][RESEND] qemu: jaso-parser: Output the content of invalid keyword
On 03/24/2010 05:17 AM, Amos Kong wrote: -fprintf(stderr, parse error: %s\n, msg); +va_list ap; +va_start(ap, msg); +fprintf(stderr, parse error: ); +vfprintf(stderr, msg, ap); +fprintf(stderr, \n); Technically you need va_end here. r~
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
On 03/24/2010 05:42 AM, Avi Kivity wrote: The filtering access part of this daemon is also not mapping well onto libvirt's access model, because we don't soley filter based on UID in libvirtd. We have it configurable based on UID, policykit, SASL, TLS/x509 already, and intend adding role based access control to further filter things, integrating with the existing apparmour/selinux security models. A qemud that filters based on UID only, gives users a side-channel to get around libvirt's access control. That's true. Any time you write a multiplexer these issues crop up. Much better to stay in single process land where everything is already taken care of. What does a multiplexer give you that making individual qemu instances discoverable doesn't give you? The later doesn't suffer from these problems. Regards, Anthony Liguori So, at best qemud is a toy for people who are annoyed by libvirt.
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
I can't quite see what such a libqemu would buy us compared to straight QMP. Talking QMP should be easy, provided you got a suitable JSON library. I agree. My undesranding is this was one of the large motivations behind using JSON: It's a common protocol that already has convenient bindings in most languages. If it's hard[1] for third parties to bind QMP to their favourite language/framework then IMHO we've done it wrong. Paul [1] Hard compared to any other sane RPC mechanism. Some languages make everything hard :-)
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
On 03/24/2010 02:19 PM, Anthony Liguori wrote: qemud - daemonaizes itself - listens on /var/lib/qemud/guests for incoming guest connections - listens on /var/lib/qemud/clients for incoming client connections - filters access according to uid (SCM_CREDENTIALS) - can pass a new monitor to client (SCM_RIGHTS) - supports 'list' command to query running guests - async messages on guest startup/exit Then guests run with the wrong security context. Why? They run with the security context of whoever launched them (could be libvirtd). -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
On 03/24/2010 02:30 PM, Anthony Liguori wrote: On 03/24/2010 07:27 AM, Avi Kivity wrote: On 03/24/2010 02:19 PM, Anthony Liguori wrote: qemud - daemonaizes itself - listens on /var/lib/qemud/guests for incoming guest connections - listens on /var/lib/qemud/clients for incoming client connections - filters access according to uid (SCM_CREDENTIALS) - can pass a new monitor to client (SCM_RIGHTS) - supports 'list' command to query running guests - async messages on guest startup/exit Then guests run with the wrong security context. Why? They run with the security context of whoever launched them (could be libvirtd). Because it doesn't have the same security context as qemud and since clients have to connect to qemud, qemud has to implement access control. Yeah. It's far better to have the qemu instance advertise itself such that and client connects directly to it. Then all of the various authorization models will be applied correctly to it. Agreed. qemud-exit(). -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
On 03/24/2010 02:32 PM, Anthony Liguori wrote: You don't get a directory filled with a zillion socket files pointing at dead guests. Agree that's a poor return on investment. Deleting it on atexit combined with flushing the whole directory at startup is a pretty reasonable solution to this (which is ultimately how the entirety of /var/run behaves). If you're really paranoid, you can fork() a helper with a shared pipe to implement unlink on close. My paranoia comes nowhere near to my dislike of forked helpers. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
On 03/24/2010 07:29 AM, Avi Kivity wrote: On 03/24/2010 02:23 PM, Anthony Liguori wrote: On 03/24/2010 05:42 AM, Avi Kivity wrote: The filtering access part of this daemon is also not mapping well onto libvirt's access model, because we don't soley filter based on UID in libvirtd. We have it configurable based on UID, policykit, SASL, TLS/x509 already, and intend adding role based access control to further filter things, integrating with the existing apparmour/selinux security models. A qemud that filters based on UID only, gives users a side-channel to get around libvirt's access control. That's true. Any time you write a multiplexer these issues crop up. Much better to stay in single process land where everything is already taken care of. What does a multiplexer give you that making individual qemu instances discoverable doesn't give you? The later doesn't suffer from these problems. You don't get a directory filled with a zillion socket files pointing at dead guests. Agree that's a poor return on investment. Deleting it on atexit combined with flushing the whole directory at startup is a pretty reasonable solution to this (which is ultimately how the entirety of /var/run behaves). If you're really paranoid, you can fork() a helper with a shared pipe to implement unlink on close. Regards, Anthony Liguori Maybe we want a O_UNLINK_ON_CLOSE for unix domain sockets - but no, that's not implementable.
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
On 03/23/2010 09:24 PM, Anthony Liguori wrote: We also provide an API for guest creation (the qemu command line). As an aside, I'd like to see all command line options have qmp equivalents (most of them can be implemented with a 'set' command that writes qdev values). This allows a uniform way to control a guest, whether at startup or runtime. You start with a case, cold-plug a motherboard, cpus, memory, disk controllers, and power it on. The main blocker to this is converting all the devices to qdev. partial conversions are not sufficient. It's approximately the same problem as a machine config file. If you have one then the other should be fairly trivial. IMO the no_user flag is a bug, and should not exist. Paul
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
On 03/24/2010 07:27 AM, Avi Kivity wrote: On 03/24/2010 02:19 PM, Anthony Liguori wrote: qemud - daemonaizes itself - listens on /var/lib/qemud/guests for incoming guest connections - listens on /var/lib/qemud/clients for incoming client connections - filters access according to uid (SCM_CREDENTIALS) - can pass a new monitor to client (SCM_RIGHTS) - supports 'list' command to query running guests - async messages on guest startup/exit Then guests run with the wrong security context. Why? They run with the security context of whoever launched them (could be libvirtd). Because it doesn't have the same security context as qemud and since clients have to connect to qemud, qemud has to implement access control. It's far better to have the qemu instance advertise itself such that and client connects directly to it. Then all of the various authorization models will be applied correctly to it. Regards, Anthony Liguori
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
On 03/24/2010 02:30 PM, Paul Brook wrote: On 03/23/2010 09:24 PM, Anthony Liguori wrote: We also provide an API for guest creation (the qemu command line). As an aside, I'd like to see all command line options have qmp equivalents (most of them can be implemented with a 'set' command that writes qdev values). This allows a uniform way to control a guest, whether at startup or runtime. You start with a case, cold-plug a motherboard, cpus, memory, disk controllers, and power it on. The main blocker to this is converting all the devices to qdev. partial conversions are not sufficient. It's approximately the same problem as a machine config file. If you have one then the other should be fairly trivial. Agreed. IMO the no_user flag is a bug, and should not exist. Sorry, what's that? -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
On 03/24/2010 02:23 PM, Anthony Liguori wrote: On 03/24/2010 05:42 AM, Avi Kivity wrote: The filtering access part of this daemon is also not mapping well onto libvirt's access model, because we don't soley filter based on UID in libvirtd. We have it configurable based on UID, policykit, SASL, TLS/x509 already, and intend adding role based access control to further filter things, integrating with the existing apparmour/selinux security models. A qemud that filters based on UID only, gives users a side-channel to get around libvirt's access control. That's true. Any time you write a multiplexer these issues crop up. Much better to stay in single process land where everything is already taken care of. What does a multiplexer give you that making individual qemu instances discoverable doesn't give you? The later doesn't suffer from these problems. You don't get a directory filled with a zillion socket files pointing at dead guests. Agree that's a poor return on investment. Maybe we want a O_UNLINK_ON_CLOSE for unix domain sockets - but no, that's not implementable. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
On 03/24/2010 07:25 AM, Paul Brook wrote: I can't quite see what such a libqemu would buy us compared to straight QMP. Talking QMP should be easy, provided you got a suitable JSON library. I agree. My undesranding is this was one of the large motivations behind using JSON: It's a common protocol that already has convenient bindings in most languages. If it's hard[1] for third parties to bind QMP to their favourite language/framework then IMHO we've done it wrong. You can't have convenient bindings to an RPC in C because it doesn't support dynamic dispatch. With most types of RPC, you have an IDL description and a code generator. But regardless of that, there are advantages to us providing a libqemu. The biggest one is that we can standardize transport implementations that include discovery mechanisms. If the core of libqemu provided an extensible transport interface, and a generic QMP request/completion mechanism, in a Python binding, you would never use the IDL generated wrappers but instead use dynamic dispatch to invoke arbitrary QMP requests. But the advantage is that if libvirt provided an API for a QMP transport encapsulated in their secure protocol, then provided the plumbed that API through their Python interface, you could use it for free in Python without having to reinvent the wheel. Regards, Anthony Liguori Paul [1] Hard compared to any other sane RPC mechanism. Some languages make everything hard :-)
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
IMO the no_user flag is a bug, and should not exist. Sorry, what's that? Usually an indication that a device has been incorrectly or inproperly converted to the qdev interface. Paul
Re: [Qemu-devel] Guest memory mapping in Qemu
If the technical documentation at http://www.usenix.org/publications/library/proceedings/usenix05/tech/freeni x/full_papers/bellard/bellard_html/index.html is still valid (I think it is), Qemu has two modes of handling access to guest memory - system emulation, in which an entire guest address space is mapped on the host, and emulated MMU. No. qemu-fast (using the host address space) was removed long ago. There are a few stray remnants, but nothing useful. We always use an emulated MMU. I was wondering whether something in-between would also be feasible. That is, chunks of guest address space (say 4MB chunks for the sake of the argument) are mmapped into the address space of the Qemu process on the host, and when an access to guest memory is made, there is an initial check to see whether it is in the same chunk as the last one, in which case all the MMU emulation bits could be saved. I could imagine Qemu keeping a current/most recent chunk for each register which can be used for relative addressing, plus one for non-register-relative accesses. It seems to me that this could potentially speed up memory access quite a bit, and as a bonus even make it easy to support x86 segmentation (as part of the bounds check for whether a memory access is in a chunk). This is effectively shadow paging implemented in userspace via mmap. It's very hard to make it work in a sane way, and even harder to make it go fast. TLB handling is already a significant bottleneck for many tasks, adding a mmap call is likely to make this orders of magnitude worse. Most guests use virtual memory extensively, so the virtual-physical mappings tend to be extremely fragmented. If you really want to do shadow paging for cross environments, you probably need to move it into kernel space. Either as a host kernel module, or as a bare-metal kernel/application that runs inside KVM. Even then you have to use various tricks to partition off a section of the host address space for use by qemu. It's not impossible, but it is a significant undertaking with somewhat unclear benefits. Paul
Re: [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/24/2010 03:59 AM, Gerd Hoffmann wrote: On 03/24/10 00:13, Jamie Lokier wrote: Gerd Hoffmann wrote: - networking: man, setting networking is a mess, libvirt just does it for you. +1 Even when not using libvirt for a reason or another I usually hook my virtual machines into virbr0 (libvirt default network). I had the opposite problem. Needed to use multiple bridges and have some VMs behind NAT without a bridge (private IPs), and some using separately firewalled bridges (needed to behave like real attached hardware with their original MACs, but be firewalled). No problem in theory. libvirt should detect existing bridges and allow you to attach virtual machines to them. So you can setup bridges and firewalling for them using usual distro tools and use them for virtual machines. In practice I've seen this not working correctly in the past, i.e. my br0 didn't pop up in the virt-manager nic setup page. Please file a bug: virt-manager has had bridge detection for years, so something must be going wrong. In f13 we will use netcf for this, so even bridge enumeration on remote hosts should work. Additionally I recently made a change upstream to allow users to manually enter a bridge name, since netcf isn't supported for all distros yet. - Cole
Re: [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
In practice I've seen this not working correctly in the past, i.e. my ^^^ br0 didn't pop up in the virt-manager nic setup page. Please file a bug: virt-manager has had bridge detection for years, so something must be going wrong. Works for me now ;) In f13 we will use netcf for this, so even bridge enumeration on remote hosts should work. Yes, remote connections. F12 + virt-preview works. cheers, Gerd
[Qemu-devel] [PATCH] qemu: jaso-parser: Output the content of invalid keyword
When input some invialid words in QMP port, qemu outputs this error message: parse error: invalid keyword `%s' This patch makes qemu output the content, like: parse error: invalid keyword `unknow_cmd' Signed-off-by: Amos Kong ak...@redhat.com --- json-parser.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/json-parser.c b/json-parser.c index 579928f..98a82af 100644 --- a/json-parser.c +++ b/json-parser.c @@ -12,6 +12,7 @@ */ #include stdbool.h +#include stdarg.h #include qemu-common.h #include qstring.h @@ -93,7 +94,11 @@ static int token_is_escape(QObject *obj, const char *value) */ static void parse_error(JSONParserContext *ctxt, QObject *token, const char *msg, ...) { -fprintf(stderr, parse error: %s\n, msg); +va_list ap; +va_start(ap, msg); +fprintf(stderr, parse error: ); +vfprintf(stderr, msg, ap); +fprintf(stderr, \n); } /** -- 1.5.5.6
[Qemu-devel] [PATCH] update bochs vbe interface
The bochs vbe interface got a new register a while back, which specifies the linear framebuffer size in 64k units. This patch adds support for the new register to qemu. With this patch applied vgabios 0.6c works with qemu. Signed-off-by: Gerd Hoffmann kra...@redhat.com --- hw/vga.c |3 ++- hw/vga_int.h |4 +++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/hw/vga.c b/hw/vga.c index 6a1a059..a5c6997 100644 --- a/hw/vga.c +++ b/hw/vga.c @@ -1955,7 +1955,8 @@ void vga_common_reset(VGACommonState *s) #ifdef CONFIG_BOCHS_VBE s-vbe_index = 0; memset(s-vbe_regs, '\0', sizeof(s-vbe_regs)); -s-vbe_regs[VBE_DISPI_INDEX_ID] = VBE_DISPI_ID0; +s-vbe_regs[VBE_DISPI_INDEX_ID] = VBE_DISPI_ID5; +s-vbe_regs[VBE_DISPI_INDEX_VIDEO_MEMORY_64K] = s-vram_size / (64 * 1024); s-vbe_start_addr = 0; s-vbe_line_offset = 0; s-vbe_bank_mask = (s-vram_size 16) - 1; diff --git a/hw/vga_int.h b/hw/vga_int.h index 23a42ef..4b8fc74 100644 --- a/hw/vga_int.h +++ b/hw/vga_int.h @@ -47,13 +47,15 @@ #define VBE_DISPI_INDEX_VIRT_HEIGHT 0x7 #define VBE_DISPI_INDEX_X_OFFSET0x8 #define VBE_DISPI_INDEX_Y_OFFSET0x9 -#define VBE_DISPI_INDEX_NB 0xa +#define VBE_DISPI_INDEX_VIDEO_MEMORY_64K 0xa +#define VBE_DISPI_INDEX_NB 0xb #define VBE_DISPI_ID0 0xB0C0 #define VBE_DISPI_ID1 0xB0C1 #define VBE_DISPI_ID2 0xB0C2 #define VBE_DISPI_ID3 0xB0C3 #define VBE_DISPI_ID4 0xB0C4 +#define VBE_DISPI_ID5 0xB0C5 #define VBE_DISPI_DISABLED 0x00 #define VBE_DISPI_ENABLED 0x01 -- 1.6.6.1
Re: [Qemu-devel] Re: Compile files only once: some planning
On 03/24/2010 12:19 PM, Richard Henderson wrote: On 03/24/2010 02:47 AM, Paolo Bonzini wrote: 1) make CPUState define only common fields. Include CPUState at the beginning of each per-target CPUXYZState. Irritatingly, the common fields contain quite big TLBs. And the offsets from the start of env affect the compactness of the code generated from TCG. We really really want the general registers to come first to make sure that those offsets fit the host's reg+offset addressing mode. What about adding a 512-bytes (or more) block or something like that at the beginning of CPUState with a union, so you can put the per-target stuff there? Paolo
RE: [Qemu-devel] Guest memory mapping in Qemu
I was wondering whether something in-between would also be feasible. That is, chunks of guest address space (say 4MB chunks for the sake of the argument) are mmapped into the address space of the Qemu process on the host, and when an access to guest memory is made, there is an initial check to see whether it is in the same chunk as the last one, in which case all the MMU emulation bits could be saved. I could imagine Qemu keeping a current/most recent chunk for each register which can be used for relative addressing, plus one for non-register-relative accesses. It seems to me that this could potentially speed up memory access quite a bit, and as a bonus even make it easy to support x86 segmentation (as part of the bounds check for whether a memory access is in a chunk). This is effectively shadow paging implemented in userspace via mmap. It's very hard to make it work in a sane way, and even harder to make it go fast. Yes, I can imagine that. TLB handling is already a significant bottleneck for many tasks, adding a mmap call is likely to make this orders of magnitude worse. That might still depend on how much old mappings get reused and how often it would be necessary to create new ones. I am tempted to do a bit of profiling of the memory usage patterns of a few guests to make an estimate. Does Qemu have any built-in statistics code that could be useful for this? [snip] If you really want to do shadow paging for cross environments, you probably need to move it into kernel space. That isn't as interesting, as there are already people doing that sort of thing. The attractive thing about Qemu's emulation mode is that is is pure userspace! Thanks for commenting. Michael _ Hotmail: Free, trusted and rich email service. https://signup.live.com/signup.aspx?id=60969
[Qemu-devel] [PATCH 01/15] virtio-serial: save/load: Ensure target has enough ports
The target could be started with max_nr_ports for a virtio-serial device lesser than what was available on the source machine. Fail the migration in such a case. Signed-off-by: Amit Shah amit.s...@redhat.com Reported-by: Juan Quintela quint...@redhat.com --- hw/virtio-serial-bus.c | 13 +++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 17c1ec1..9a7f0c1 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -374,10 +374,13 @@ static void virtio_serial_save(QEMUFile *f, void *opaque) /* Items in struct VirtIOSerial */ +qemu_put_be32s(f, s-bus-max_nr_ports); + /* Do this because we might have hot-unplugged some ports */ nr_active_ports = 0; -QTAILQ_FOREACH(port, s-ports, next) +QTAILQ_FOREACH(port, s-ports, next) { nr_active_ports++; +} qemu_put_be32s(f, nr_active_ports); @@ -399,7 +402,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) { VirtIOSerial *s = opaque; VirtIOSerialPort *port; -uint32_t nr_active_ports; +uint32_t max_nr_ports, nr_active_ports; unsigned int i; if (version_id 2) { @@ -420,6 +423,12 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) /* Items in struct VirtIOSerial */ +qemu_get_be32s(f, max_nr_ports); +if (max_nr_ports s-bus-max_nr_ports) { +/* Source could have more ports than us. Fail migration. */ +return -EINVAL; +} + qemu_get_be32s(f, nr_active_ports); /* Items in struct VirtIOSerialPort */ -- 1.6.2.5
[Qemu-devel] [PATCH 02/15] virtio-serial: save/load: Ensure nr_ports on src and dest are same.
The number of ports on the source as well as the destination machines should match. If they don't, it means some ports that got hotplugged on the source aren't instantiated on the destination. Or that ports that were hot-unplugged on the source are created on the destination. Signed-off-by: Amit Shah amit.s...@redhat.com Reported-by: Juan Quintela quint...@redhat.com --- hw/virtio-serial-bus.c | 18 -- 1 files changed, 16 insertions(+), 2 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 9a7f0c1..d31e62d 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -402,7 +402,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) { VirtIOSerial *s = opaque; VirtIOSerialPort *port; -uint32_t max_nr_ports, nr_active_ports; +uint32_t max_nr_ports, nr_active_ports, nr_ports; unsigned int i; if (version_id 2) { @@ -419,7 +419,21 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) /* The config space */ qemu_get_be16s(f, s-config.cols); qemu_get_be16s(f, s-config.rows); -s-config.nr_ports = qemu_get_be32(f); +nr_ports = qemu_get_be32(f); + +if (nr_ports != s-config.nr_ports) { +/* + * Source hot-plugged/unplugged ports and we don't have all of + * them here. + * + * Note: This condition cannot check for all hotplug/unplug + * events: eg, if one port was hot-plugged and one was + * unplugged, the nr_ports remains the same but the port id's + * would have changed and we won't catch it here. A later + * check for !find_port_by_id() will confirm if this happened. + */ +return -EINVAL; +} /* Items in struct VirtIOSerial */ -- 1.6.2.5
[Qemu-devel] [PATCH 04/15] virtio-serial: save/load: Send target host connection status if different
If the host connection to a port is closed on the destination machine after migration, whereas the connection was open on the source, the guest has to be informed of that. Similar for a host connection open on the destination. Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/virtio-serial-bus.c | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 5316ef6..484dc94 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -395,6 +395,7 @@ static void virtio_serial_save(QEMUFile *f, void *opaque) */ qemu_put_be32s(f, port-id); qemu_put_byte(f, port-guest_connected); +qemu_put_byte(f, port-host_connected); } } @@ -448,6 +449,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) /* Items in struct VirtIOSerialPort */ for (i = 0; i nr_active_ports; i++) { uint32_t id; +bool host_connected; id = qemu_get_be32(f); port = find_port_by_id(s, id); @@ -460,6 +462,15 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) } port-guest_connected = qemu_get_byte(f); +host_connected = qemu_get_byte(f); +if (host_connected != port-host_connected) { +/* + * We have to let the guest know of the host connection + * status change + */ +send_control_event(port, VIRTIO_CONSOLE_PORT_OPEN, + port-host_connected); +} } return 0; -- 1.6.2.5
[Qemu-devel] [PATCH 05/15] virtio-serial: Use control messages to notify guest of new ports
Allow the port 'id's to be set by a user on the command line. This is needed by management apps that will want a stable port numbering scheme for hot-plug/unplug and migration. Since the port numbers are shared with the guest (to identify ports in control messages), we just send a control message to the guest indicating addition of new ports (hot-plug) or notifying the guest of the available ports when the guest sends us a DEVICE_READY control message. Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/virtio-console.c|2 + hw/virtio-serial-bus.c | 181 +++- hw/virtio-serial.h | 17 +++-- 3 files changed, 130 insertions(+), 70 deletions(-) diff --git a/hw/virtio-console.c b/hw/virtio-console.c index bd44ec6..17b221d 100644 --- a/hw/virtio-console.c +++ b/hw/virtio-console.c @@ -99,6 +99,7 @@ static VirtIOSerialPortInfo virtconsole_info = { .exit = virtconsole_exitfn, .qdev.props = (Property[]) { DEFINE_PROP_UINT8(is_console, VirtConsole, port.is_console, 1), +DEFINE_PROP_UINT32(nr, VirtConsole, port.id, VIRTIO_CONSOLE_BAD_ID), DEFINE_PROP_CHR(chardev, VirtConsole, chr), DEFINE_PROP_STRING(name, VirtConsole, port.name), DEFINE_PROP_END_OF_LIST(), @@ -133,6 +134,7 @@ static VirtIOSerialPortInfo virtserialport_info = { .init = virtserialport_initfn, .exit = virtconsole_exitfn, .qdev.props = (Property[]) { +DEFINE_PROP_UINT32(nr, VirtConsole, port.id, VIRTIO_CONSOLE_BAD_ID), DEFINE_PROP_CHR(chardev, VirtConsole, chr), DEFINE_PROP_STRING(name, VirtConsole, port.name), DEFINE_PROP_END_OF_LIST(), diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 484dc94..00e8616 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -41,6 +41,10 @@ struct VirtIOSerial { VirtIOSerialBus *bus; QTAILQ_HEAD(, VirtIOSerialPort) ports; + +/* bitmap for identifying active ports */ +uint32_t *ports_map; + struct virtio_console_config config; }; @@ -48,6 +52,10 @@ static VirtIOSerialPort *find_port_by_id(VirtIOSerial *vser, uint32_t id) { VirtIOSerialPort *port; +if (id == VIRTIO_CONSOLE_BAD_ID) { +return NULL; +} + QTAILQ_FOREACH(port, vser-ports, next) { if (port-id == id) return port; @@ -208,14 +216,25 @@ static void handle_control_message(VirtIOSerial *vser, void *buf) size_t buffer_len; gcpkt = buf; -port = find_port_by_id(vser, ldl_p(gcpkt-id)); -if (!port) -return; cpkt.event = lduw_p(gcpkt-event); cpkt.value = lduw_p(gcpkt-value); +port = find_port_by_id(vser, ldl_p(gcpkt-id)); +if (!port cpkt.event != VIRTIO_CONSOLE_DEVICE_READY) +return; + switch(cpkt.event) { +case VIRTIO_CONSOLE_DEVICE_READY: +/* + * The device is up, we can now tell the device about all the + * ports we have here. + */ +QTAILQ_FOREACH(port, vser-ports, next) { +send_control_event(port, VIRTIO_CONSOLE_PORT_ADD, 1); +} +break; + case VIRTIO_CONSOLE_PORT_READY: /* * Now that we know the guest asked for the port name, we're @@ -370,13 +389,16 @@ static void virtio_serial_save(QEMUFile *f, void *opaque) /* The config space */ qemu_put_be16s(f, s-config.cols); qemu_put_be16s(f, s-config.rows); -qemu_put_be32s(f, s-config.nr_ports); -/* Items in struct VirtIOSerial */ +qemu_put_be32s(f, s-config.max_nr_ports); + +/* The ports map */ -qemu_put_be32s(f, s-bus-max_nr_ports); +qemu_put_buffer(f, (uint8_t *)s-ports_map, +sizeof(uint32_t) * (s-config.max_nr_ports + 31) / 32); + +/* Ports */ -/* Do this because we might have hot-unplugged some ports */ nr_active_ports = 0; QTAILQ_FOREACH(port, s-ports, next) { nr_active_ports++; @@ -388,11 +410,6 @@ static void virtio_serial_save(QEMUFile *f, void *opaque) * Items in struct VirtIOSerialPort. */ QTAILQ_FOREACH(port, s-ports, next) { -/* - * We put the port number because we may not have an active - * port at id 0 that's reserved for a console port, or in case - * of ports that might have gotten unplugged - */ qemu_put_be32s(f, port-id); qemu_put_byte(f, port-guest_connected); qemu_put_byte(f, port-host_connected); @@ -403,7 +420,8 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) { VirtIOSerial *s = opaque; VirtIOSerialPort *port; -uint32_t max_nr_ports, nr_active_ports, nr_ports; +size_t ports_map_size; +uint32_t max_nr_ports, nr_active_ports, *ports_map; unsigned int i; if (version_id 2) { @@ -420,29 +438,28 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) /* The config space */
[Qemu-devel] [PATCH 06/15] virtio-serial: whitespace: match surrounding code
The virtio-serial code doesn't mix declarations and definitions, so separate them out on different lines. Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/virtio-serial-bus.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 00e8616..80f0259 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -354,7 +354,10 @@ static void handle_input(VirtIODevice *vdev, VirtQueue *vq) static uint32_t get_features(VirtIODevice *vdev, uint32_t features) { -VirtIOSerial *vser = DO_UPCAST(VirtIOSerial, vdev, vdev); +VirtIOSerial *vser; + +vser = DO_UPCAST(VirtIOSerial, vdev, vdev); + if (vser-bus-max_nr_ports 1) { features |= (1 VIRTIO_CONSOLE_F_MULTIPORT); } -- 1.6.2.5
[Qemu-devel] [PATCH 08/15] virtio-serial: Update copyright year to 2010
Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/virtio-console.c|2 +- hw/virtio-serial-bus.c |2 +- hw/virtio-serial.h |2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/hw/virtio-console.c b/hw/virtio-console.c index 17b221d..6b8 100644 --- a/hw/virtio-console.c +++ b/hw/virtio-console.c @@ -1,7 +1,7 @@ /* * Virtio Console and Generic Serial Port Devices * - * Copyright Red Hat, Inc. 2009 + * Copyright Red Hat, Inc. 2009, 2010 * * Authors: * Amit Shah amit.s...@redhat.com diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 4435c62..a19e751 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -1,7 +1,7 @@ /* * A bus for connecting virtio serial and console ports * - * Copyright (C) 2009 Red Hat, Inc. + * Copyright (C) 2009, 2010 Red Hat, Inc. * * Author(s): * Amit Shah amit.s...@redhat.com diff --git a/hw/virtio-serial.h b/hw/virtio-serial.h index 0548689..f023873 100644 --- a/hw/virtio-serial.h +++ b/hw/virtio-serial.h @@ -2,7 +2,7 @@ * Virtio Serial / Console Support * * Copyright IBM, Corp. 2008 - * Copyright Red Hat, Inc. 2009 + * Copyright Red Hat, Inc. 2009, 2010 * * Authors: * Christian Ehrhardt ehrha...@linux.vnet.ibm.com -- 1.6.2.5
[Qemu-devel] [PATCH 09/15] virtio-serial: Propagate errors in initialising ports / devices in guest
If adding of ports or devices in the guest fails we can send out a QMP event so that management software can deal with it. Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/virtio-serial-bus.c | 10 ++ 1 files changed, 10 insertions(+), 0 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index a19e751..33083af 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -223,6 +223,11 @@ static void handle_control_message(VirtIOSerial *vser, void *buf) switch(cpkt.event) { case VIRTIO_CONSOLE_DEVICE_READY: +if (!cpkt.value) { +error_report(virtio-serial-bus: Guest failure in adding device %s\n, +vser-bus-qbus.name); +break; +} /* * The device is up, we can now tell the device about all the * ports we have here. @@ -233,6 +238,11 @@ static void handle_control_message(VirtIOSerial *vser, void *buf) break; case VIRTIO_CONSOLE_PORT_READY: +if (!cpkt.value) { +error_report(virtio-serial-bus: Guest failure in adding port %u for device %s\n, + port-id, vser-bus-qbus.name); +break; +} /* * Now that we know the guest asked for the port name, we're * sure the guest has initialised whatever state is necessary -- 1.6.2.5
[Qemu-devel] [PATCH 07/15] virtio-serial: Remove redundant check for 0-sized write request
The check for a 0-sized write request to a guest port is not necessary; the while loop below won't be executed in this case and all will be fine. Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/virtio-serial-bus.c |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 80f0259..4435c62 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -91,9 +91,6 @@ static size_t write_to_port(VirtIOSerialPort *port, if (!virtio_queue_ready(vq)) { return 0; } -if (!size) { -return 0; -} while (offset size) { int i; -- 1.6.2.5
[Qemu-devel] [PATCH 10/15] virtio-serial: Add QMP events for failed port/device add
When adding a port or a device to the guest fails, management software might be interested in knowing and then cleaning up the host-side of the port. Introduce QMP events to signal such errors. Signed-off-by: Amit Shah amit.s...@redhat.com CC: Luiz Capitulino lcapitul...@redhat.com --- QMP/qmp-events.txt | 48 hw/virtio-serial-bus.c | 15 +++ monitor.c |3 +++ monitor.h |1 + 4 files changed, 67 insertions(+), 0 deletions(-) diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt index a94e9b4..f13cf45 100644 --- a/QMP/qmp-events.txt +++ b/QMP/qmp-events.txt @@ -188,3 +188,51 @@ Example: Note: If action is reset, shutdown, or pause the WATCHDOG event is followed respectively by the RESET, SHUTDOWN, or STOP events. + +VIRTIO_SERIAL +- + +Emitted when errors occur in guest port add or guest device add. + +Data: + +- device: The virtio-serial device that triggered the event {json-string} + This is the name given to the bus on the command line: +-device virtio-serial,id=foo + here, the device name is foo + +- port: The port number that triggered the event {json-number} + This is the number given to the port on the command line: +-device virtserialport,nr=2 + here, the port number is 2. If not mentioned on the command + line, the number is auto-assigned. + +- result: The result of the operation {json-string} + This is one of the following: + pass, fail + +- operation: The operation that triggered the event {json-sring} + This is one of the following: + port_add, device_add + +Example: + +Port 0 add failure in the guest: + +{ timestamp: {seconds: 1269438649, microseconds: 851170}, + event: VIRTIO_SERIAL, + data: { +device: virtio-serial-bus.0, +port: 0, +result: fail, +operation: port_add } } + +Device add failure in the guest: + +{ timestamp: {seconds: 1269438702, microseconds: 78114}, + event: VIRTIO_SERIAL, + data: { +device: virtio-serial-bus.0, +port: 4294967295, +result: fail, +operation: device_add } } diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 33083af..efcc66c 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -16,6 +16,7 @@ */ #include monitor.h +#include qemu-objects.h #include qemu-queue.h #include sysbus.h #include virtio-serial.h @@ -204,6 +205,17 @@ size_t virtio_serial_guest_ready(VirtIOSerialPort *port) return 0; } +static void mon_event(const char *device, const uint32_t port_id, + const char *operation, const char *result) +{ +QObject *data; + +data = qobject_from_jsonf({ 'device': %s, 'port': %ld, 'operation': %s, 'result': %s }, + device, (int64_t)port_id, operation, result); +monitor_protocol_event(QEVENT_VIRTIO_SERIAL, data); +qobject_decref(data); +} + /* Guest wants to notify us of some event */ static void handle_control_message(VirtIOSerial *vser, void *buf) { @@ -226,6 +238,8 @@ static void handle_control_message(VirtIOSerial *vser, void *buf) if (!cpkt.value) { error_report(virtio-serial-bus: Guest failure in adding device %s\n, vser-bus-qbus.name); +mon_event(vser-bus-qbus.name, VIRTIO_CONSOLE_BAD_ID, + device_add, fail); break; } /* @@ -241,6 +255,7 @@ static void handle_control_message(VirtIOSerial *vser, void *buf) if (!cpkt.value) { error_report(virtio-serial-bus: Guest failure in adding port %u for device %s\n, port-id, vser-bus-qbus.name); +mon_event(vser-bus-qbus.name, port-id, port_add, fail); break; } /* diff --git a/monitor.c b/monitor.c index 0448a70..9e5bfe7 100644 --- a/monitor.c +++ b/monitor.c @@ -441,6 +441,9 @@ void monitor_protocol_event(MonitorEvent event, QObject *data) case QEVENT_WATCHDOG: event_name = WATCHDOG; break; +case QEVENT_VIRTIO_SERIAL: +event_name = VIRTIO_SERIAL; +break; default: abort(); break; diff --git a/monitor.h b/monitor.h index bd4ae34..ea4df8a 100644 --- a/monitor.h +++ b/monitor.h @@ -28,6 +28,7 @@ typedef enum MonitorEvent { QEVENT_BLOCK_IO_ERROR, QEVENT_RTC_CHANGE, QEVENT_WATCHDOG, +QEVENT_VIRTIO_SERIAL, QEVENT_MAX, } MonitorEvent; -- 1.6.2.5
[Qemu-devel] [PATCH 11/15] virtio-serial: Send out guest data to ports only if port is opened
Data should be written only when ports are open. Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/virtio-serial-bus.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index efcc66c..80fbff4 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -350,6 +350,11 @@ static void handle_output(VirtIODevice *vdev, VirtQueue *vq) goto next_buf; } + if (!port-host_connected) { +ret = 0; +goto next_buf; +} + /* * A port may not have any handler registered for consuming the * data that the guest sends or it may not have a chardev associated -- 1.6.2.5
[Qemu-devel] [PATCH 12/15] iov: Introduce a new file for helpers around iovs, add iov_from_buf()
The virtio-net code uses iov_fill() which fills an iov from a linear buffer. The virtio-serial-bus code does something similar in an open-coded function. Create a new iov.c file that has iov_from_buf(). Convert virtio-net and virtio-serial-bus over to use this functionality. virtio-net used ints to hold sizes, the new function is going to use size_t types. Later commits will add the opposite functionality -- going from an iov to a linear buffer. Signed-off-by: Amit Shah amit.s...@redhat.com --- Makefile.objs |1 + hw/iov.c | 33 + hw/iov.h | 16 hw/virtio-net.c| 20 +++- hw/virtio-serial-bus.c | 15 +++ 5 files changed, 60 insertions(+), 25 deletions(-) create mode 100644 hw/iov.c create mode 100644 hw/iov.h diff --git a/Makefile.objs b/Makefile.objs index 281f7a6..212ae1d 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -127,6 +127,7 @@ user-obj-y += cutils.o cache-utils.o # libhw hw-obj-y = +hw-obj-y += iov.o hw-obj-y += loader.o hw-obj-y += virtio.o virtio-console.o virtio-pci.o hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o diff --git a/hw/iov.c b/hw/iov.c new file mode 100644 index 000..07bd499 --- /dev/null +++ b/hw/iov.c @@ -0,0 +1,33 @@ +/* + * Helpers for getting linearized buffers from iov / filling buffers into iovs + * + * Copyright IBM, Corp. 2007, 2008 + * Copyright (C) 2010 Red Hat, Inc. + * + * Author(s): + * Anthony Liguori aligu...@us.ibm.com + * Amit Shah amit.s...@redhat.com + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#include iov.h + +size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt, +const void *buf, size_t size) +{ +size_t offset; +unsigned int i; + +offset = 0; +for (i = 0; offset size i iovcnt; i++) { +size_t len; + +len = MIN(iov[i].iov_len, size - offset); + +memcpy(iov[i].iov_base, buf + offset, len); +offset += len; +} +return offset; +} diff --git a/hw/iov.h b/hw/iov.h new file mode 100644 index 000..5e3e541 --- /dev/null +++ b/hw/iov.h @@ -0,0 +1,16 @@ +/* + * Helpers for getting linearized buffers from iov / filling buffers into iovs + * + * Copyright (C) 2010 Red Hat, Inc. + * + * Author(s): + * Amit Shah amit.s...@redhat.com + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ + +#include qemu-common.h + +size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt, +const void *buf, size_t size); diff --git a/hw/virtio-net.c b/hw/virtio-net.c index be33c68..273e7f9 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -11,6 +11,7 @@ * */ +#include iov.h #include virtio.h #include net.h #include net/checksum.h @@ -423,21 +424,6 @@ static void work_around_broken_dhclient(struct virtio_net_hdr *hdr, } } -static int iov_fill(struct iovec *iov, int iovcnt, const void *buf, int count) -{ -int offset, i; - -offset = i = 0; -while (offset count i iovcnt) { -int len = MIN(iov[i].iov_len, count - offset); -memcpy(iov[i].iov_base, buf + offset, len); -offset += len; -i++; -} - -return offset; -} - static int receive_header(VirtIONet *n, struct iovec *iov, int iovcnt, const void *buf, size_t size, size_t hdr_len) { @@ -573,8 +559,8 @@ static ssize_t virtio_net_receive(VLANClientState *nc, const uint8_t *buf, size_ } /* copy in packet. ugh */ -len = iov_fill(sg, elem.in_num, - buf + offset, size - offset); +len = iov_from_buf(sg, elem.in_num, + buf + offset, size - offset); total += len; /* signal other side */ diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 80fbff4..bd1223e 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -15,6 +15,7 @@ * the COPYING file in the top-level directory. */ +#include iov.h #include monitor.h #include qemu-objects.h #include qemu-queue.h @@ -85,27 +86,25 @@ static size_t write_to_port(VirtIOSerialPort *port, { VirtQueueElement elem; VirtQueue *vq; -size_t offset = 0; -size_t len = 0; +size_t offset; vq = port-ivq; if (!virtio_queue_ready(vq)) { return 0; } +offset = 0; while (offset size) { -int i; +size_t len; if (!virtqueue_pop(vq, elem)) { break; } -for (i = 0; offset size i elem.in_num; i++) { -len = MIN(elem.in_sg[i].iov_len, size - offset); +len = iov_from_buf(elem.in_sg, elem.in_num, + buf + offset, size - offset); +offset += len; -memcpy(elem.in_sg[i].iov_base, buf + offset,
[Qemu-devel] [PATCH 13/15] iov: Add iov_to_buf and iov_size helpers
iov_to_buf() puts the buffer contents in the iov in a linearized buffer. iov_size() gets the length of the contents in the iov. The iov_to_buf() function is the memcpy_to_iovec() function that was used in virtio-ballon.c. Signed-off-by: Amit Shah amit.s...@redhat.com --- hw/iov.c| 37 + hw/iov.h|3 +++ hw/virtio-balloon.c | 35 --- 3 files changed, 44 insertions(+), 31 deletions(-) diff --git a/hw/iov.c b/hw/iov.c index 07bd499..d4013cd 100644 --- a/hw/iov.c +++ b/hw/iov.c @@ -31,3 +31,40 @@ size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt, } return offset; } + +size_t iov_to_buf(struct iovec *iov, unsigned int iovcnt, + void *buf, size_t offset, size_t size) +{ +uint8_t *ptr; +size_t iov_off, buf_off; +unsigned int i; + +ptr = buf; +iov_off = 0; +buf_off = 0; +for (i = 0; i iovcnt size; i++) { +if (offset (iov_off + iov[i].iov_len)) { +size_t len = MIN((iov_off + iov[i].iov_len) - offset , size); + +memcpy(ptr + buf_off, iov[i].iov_base + (offset - iov_off), len); + +buf_off += len; +offset += len; +size -= len; +} +iov_off += iov[i].iov_len; +} +return buf_off; +} + +size_t iov_size(struct iovec *iov, unsigned int iovcnt) +{ +size_t len; +unsigned int i; + +len = 0; +for (i = 0; i iovcnt; i++) { +len += iov[i].iov_len; +} +return len; +} diff --git a/hw/iov.h b/hw/iov.h index 5e3e541..c977ff1 100644 --- a/hw/iov.h +++ b/hw/iov.h @@ -14,3 +14,6 @@ size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt, const void *buf, size_t size); +size_t iov_to_buf(struct iovec *iov, unsigned int iovcnt, + void *buf, size_t offset, size_t size); +size_t iov_size(struct iovec *iov, unsigned int iovcnt); diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c index 6d12024..4414eae 100644 --- a/hw/virtio-balloon.c +++ b/hw/virtio-balloon.c @@ -11,6 +11,7 @@ * */ +#include iov.h #include qemu-common.h #include virtio.h #include pc.h @@ -91,33 +92,6 @@ static QObject *get_stats_qobject(VirtIOBalloon *dev) return QOBJECT(dict); } -/* FIXME: once we do a virtio refactoring, this will get subsumed into common - * code */ -static size_t memcpy_from_iovector(void *data, size_t offset, size_t size, - struct iovec *iov, int iovlen) -{ -int i; -uint8_t *ptr = data; -size_t iov_off = 0; -size_t data_off = 0; - -for (i = 0; i iovlen size; i++) { -if (offset (iov_off + iov[i].iov_len)) { -size_t len = MIN((iov_off + iov[i].iov_len) - offset , size); - -memcpy(ptr + data_off, iov[i].iov_base + (offset - iov_off), len); - -data_off += len; -offset += len; -size -= len; -} - -iov_off += iov[i].iov_len; -} - -return data_off; -} - static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq) { VirtIOBalloon *s = to_virtio_balloon(vdev); @@ -127,8 +101,7 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq) size_t offset = 0; uint32_t pfn; -while (memcpy_from_iovector(pfn, offset, 4, -elem.out_sg, elem.out_num) == 4) { +while (iov_to_buf(elem.out_sg, elem.out_num, pfn, offset, 4) == 4) { ram_addr_t pa; ram_addr_t addr; @@ -180,8 +153,8 @@ static void virtio_balloon_receive_stats(VirtIODevice *vdev, VirtQueue *vq) */ reset_stats(s); -while (memcpy_from_iovector(stat, offset, sizeof(stat), elem-out_sg, -elem-out_num) == sizeof(stat)) { +while (iov_to_buf(elem-out_sg, elem-out_num, stat, offset, sizeof(stat)) + == sizeof(stat)) { uint16_t tag = tswap16(stat.tag); uint64_t val = tswap64(stat.val); -- 1.6.2.5
[Qemu-devel] [PATCH 14/15] virtio-serial: Handle scatter-gather buffers for control messages
Current control messages are small enough to not be split into multiple buffers but we could run into such a situation in the future or a malicious guest could cause such a situation. So handle the entire iov request for control messages. Also ensure the size of the control request is = what we expect otherwise we risk accessing memory that we don't own. Signed-off-by: Amit Shah amit.s...@redhat.com CC: Avi Kivity a...@redhat.com Reported-by: Avi Kivity a...@redhat.com --- hw/virtio-serial-bus.c | 34 +++--- 1 files changed, 31 insertions(+), 3 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index bd1223e..3edfeca 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -216,7 +216,7 @@ static void mon_event(const char *device, const uint32_t port_id, } /* Guest wants to notify us of some event */ -static void handle_control_message(VirtIOSerial *vser, void *buf) +static void handle_control_message(VirtIOSerial *vser, void *buf, size_t len) { struct VirtIOSerialPort *port; struct virtio_console_control cpkt, *gcpkt; @@ -225,6 +225,11 @@ static void handle_control_message(VirtIOSerial *vser, void *buf) gcpkt = buf; +if (len sizeof(cpkt)) { +/* The guest sent an invalid control packet */ +return; +} + cpkt.event = lduw_p(gcpkt-event); cpkt.value = lduw_p(gcpkt-value); @@ -321,12 +326,35 @@ static void control_out(VirtIODevice *vdev, VirtQueue *vq) { VirtQueueElement elem; VirtIOSerial *vser; +uint8_t *buf; +size_t len; vser = DO_UPCAST(VirtIOSerial, vdev, vdev); +len = 0; +buf = NULL; while (virtqueue_pop(vq, elem)) { -handle_control_message(vser, elem.out_sg[0].iov_base); -virtqueue_push(vq, elem, elem.out_sg[0].iov_len); +size_t cur_len, copied; + +cur_len = iov_size(elem.out_sg, elem.out_num); +/* + * Allocate a new buf only if we didn't have one previously or + * if the size of the buf differs + */ +if (cur_len != len) { +if (len) { +qemu_free(buf); +} +buf = qemu_malloc(cur_len); +len = cur_len; +} +copied = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, len); + +handle_control_message(vser, buf, copied); +virtqueue_push(vq, elem, copied); +} +if (len) { +qemu_free(buf); } virtio_notify(vdev, vq); } -- 1.6.2.5
[Qemu-devel] [PATCH 15/15] virtio-serial: Handle scatter/gather input from the guest
Current guests don't send more than one iov but it can change later. Ensure we handle that case. Signed-off-by: Amit Shah amit.s...@redhat.com CC: Avi Kivity a...@redhat.com --- hw/virtio-serial-bus.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 3edfeca..42e4ed0 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -369,7 +369,8 @@ static void handle_output(VirtIODevice *vdev, VirtQueue *vq) while (virtqueue_pop(vq, elem)) { VirtIOSerialPort *port; -size_t ret; +uint8_t *buf; +size_t ret, buf_size; port = find_port_by_vq(vser, vq); if (!port) { @@ -392,9 +393,12 @@ static void handle_output(VirtIODevice *vdev, VirtQueue *vq) goto next_buf; } -/* The guest always sends only one sg */ -ret = port-info-have_data(port, elem.out_sg[0].iov_base, -elem.out_sg[0].iov_len); +buf_size = iov_size(elem.out_sg, elem.out_num); +buf = qemu_malloc(buf_size); +ret = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, buf_size); + +ret = port-info-have_data(port, buf, ret); +qemu_free(buf); next_buf: virtqueue_push(vq, elem, ret); -- 1.6.2.5
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
Paul Brook p...@codesourcery.com writes: IMO the no_user flag is a bug, and should not exist. Sorry, what's that? Usually an indication that a device has been incorrectly or inproperly converted to the qdev interface. Can also be an indication that the device can't support multiple instances. For instance: commit 39a51dfda835a75c0ebbfd92705b96e4de77f795 Author: Markus Armbruster arm...@redhat.com Date: Tue Oct 27 13:52:13 2009 +0100 qdev: Tag isa-fdc, PIIX3 IDE and PIIX4 IDE as no-user These devices are created automatically, and attempting to create another one with -device fails with qemu: hardware error: register_ioport_write: invalid opaque. Signed-off-by: Markus Armbruster arm...@redhat.com Signed-off-by: Anthony Liguori aligu...@us.ibm.com With no-user, we at least fail with a decent error message. I don't think it's fair to demand that a qdev conversion must relax restrictions that haven't otherwise bothered us to be correct and proper. We'll relax them if and when they bother us enough to make somebody send a decent patch. And yes, there are better ways to disallow multiple instances of a device than declaring it no-user. Patches welcome.
Re: [Qemu-devel] [PATCH] qemu: jaso-parser: Output the content of invalid keyword
Amos Kong ak...@redhat.com writes: When input some invialid word 'unknowcmd' through QMP port, qemu outputs this error message: parse error: invalid keyword `%s' This patch makes qemu output the content of invalid keyword, like: parse error: invalid keyword `unknowcmd' Signed-off-by: Amos Kong ak...@redhat.com Looks good to me. Hint: it's best to put a version in the subject when you respin, like [PATCH v2] ...
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
Paul Brook p...@codesourcery.com writes: IMO the no_user flag is a bug, and should not exist. Sorry, what's that? Usually an indication that a device has been incorrectly or inproperly converted to the qdev interface. Can also be an indication that the device can't support multiple instances. For instance: qdev: Tag isa-fdc, PIIX3 IDE and PIIX4 IDE as no-user I still claim this is a bug, and see no good reason why we shouldn't support multiple floppy controllers, ISA busses, etc. Paul
Re: [Qemu-devel] Re: Compile files only once: some planning
On 03/24/2010 03:56 PM, Paul Brook wrote: On 03/24/2010 12:19 PM, Richard Henderson wrote: On 03/24/2010 02:47 AM, Paolo Bonzini wrote: 1) make CPUState define only common fields. Include CPUState at the beginning of each per-target CPUXYZState. Irritatingly, the common fields contain quite big TLBs. And the offsets from the start of env affect the compactness of the code generated from TCG. We really really want the general registers to come first to make sure that those offsets fit the host's reg+offset addressing mode. What about adding a 512-bytes (or more) block or something like that at the beginning of CPUState with a union, so you can put the per-target stuff there? Is it really worth the hassle? Anything touching CPUState is probably going to be CPU specific anyway. qemu-timer.c, hw/dma.c is not and these are the first two files I looked at. translate-all.c is the third, and it is except for a trivial cleanup. Big parts of vl.c are independent too. As a quick check: $ git grep -l 'CPUState' | grep -ve ^tcg -e ^target- | wc -l 96 $ git grep -l 'CPUState' | grep -ve ^tcg -e ^target- | \ xargs grep -l '#if.*TARGET_' | wc -l 36 The ones that remain are pretty much what would you expect, besides translate-all.c and some in hw/ which I snipped: bsd-user/main.c darwin-user/main.c darwin-user/qemu.h darwin-user/signal.c linux-user/elfload.c linux-user/main.c linux-user/qemu.h linux-user/signal.c linux-user/syscall.c cpu-all.h cpu-defs.h cpu-exec.c def-helper.h disas.c exec-all.h exec.c gdbstub.c monitor.c translate-all.c vl.c Of course this doesn't mean that 60 files are target-independent, but 30-ish probably are or can be made so. It would also help code clarity to use CPUXYZState more, to understand which hw/ files are specific to one model. For hw/s390-virtio.c that's obvious, but slightly less so for hw/sun4m.c and even less so for hw/syborg.c. This is an independent cleanup. Paolo
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
On Wed, 24 Mar 2010 12:42:16 +0200 Avi Kivity a...@redhat.com wrote: So, at best qemud is a toy for people who are annoyed by libvirt. Is the reason for doing this in qemu because libvirt is annoying? I don't see how adding yet another layer/daemon is going to improve ours and user's life (the same applies for libqemu). If I got it right, there were two complaints from the kvm-devel flamewar: 1. Qemu has usability problems 2. There's no way an external tool can get /proc/kallsyms info from Qemu I don't see how libqemu can help with 1) and having qemud doesn't seem the best solution for 2) either. Still talking about 2), what's wrong in getting the PID or having a QMP connection in a well known location as suggested by Anthony?
Re: [Qemu-devel] Re: Compile files only once: some planning
On 03/24/2010 07:45 AM, Paolo Bonzini wrote: On 03/24/2010 12:19 PM, Richard Henderson wrote: On 03/24/2010 02:47 AM, Paolo Bonzini wrote: 1) make CPUState define only common fields. Include CPUState at the beginning of each per-target CPUXYZState. Irritatingly, the common fields contain quite big TLBs. And the offsets from the start of env affect the compactness of the code generated from TCG. We really really want the general registers to come first to make sure that those offsets fit the host's reg+offset addressing mode. What about adding a 512-bytes (or more) block or something like that at the beginning of CPUState with a union, so you can put the per-target stuff there? I think that would be confusing. What might be just as good (although possibly just as confusing) is to move the big members into a different structure. E.g. struct CPUSmallCommonState { // most of the stuff from CPU_COMMON. // sorted for some thought of padding elimination. ;-) }; struct CPULargeCommonState { CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE]; target_phys_addr_t iotlb[NB_MMU_MODES][CPU_TLB_SIZE]; struct TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE]; jmp_buf jmp_env; }; struct CPUXYZSmallState { CPUSmallCommonState common_s; // the rest of the cpu-specific stuff. }; struct CPUXYZLargeState { CPUXYZSmallState s; CPUBigCommonState common_l; }; extern int cpu_large_state_offset = offsetof(CPUXYZLargeState, common_l); Now. If you're compiling a file for which cpu-specific code is ok: register CPUXYZLargeState *env __asm__(AREG0); #define ENV_SMALL_COMMON_STATE(env-s.common_s) #define ENV_LARGE_COMMON_STATE(env-common_l) If you're compiling a file which is supposed to be independant of cpu: register CPUSmallCommonState *env __asm__(AREG0); #define ENV_SMALL_COMMON_STATE(env) #define ENV_LARGE_COMMON_STATE((CPULargeCommonState *)((char *)env + cpu_large_state_offset)) For the gcc-compiled code, the addition of the cpu_large_state_offset is probably more or less on par in efficiency with indirection. But for TCG generated code, the variable read happens at code generation time, which means we *still* have a constant in the generated code. r~
Re: [Qemu-devel] Re: Compile files only once: some planning
1) make CPUState define only common fields. Include CPUState at the beginning of each per-target CPUXYZState. Irritatingly, the common fields contain quite big TLBs. And the offsets from the start of env affect the compactness of the code generated from TCG. We really really want the general registers to come first to make sure that those offsets fit the host's reg+offset addressing mode. What about adding a 512-bytes (or more) block or something like that at the beginning of CPUState with a union, so you can put the per-target stuff there? Is it really worth the hassle? Anything touching CPUState is probably going to be CPU specific anyway. qemu-timer.c, hw/dma.c is not and these are the first two files I looked at. translate-all.c is the third, and it is except for a trivial cleanup. The use in hw/dma.c is incorrect. See previous discussion about how qemu_bh_schedule_idle needs to go away. I'm also unconvinced by your numbers. My i386-softmmu/ directory contains only 43 object files, most of are device emulation and don't touch CPU state at all. arm-softmmu/ contains a good number more, but that's mostly board init (which needs to know which CPU it's creating), and devices that are only used by one board so noone's bothered to move them into libhw. Paul
[Qemu-devel] [PATCH 0/3] Fix S390x target
We're in a bad situation with the S390 qemu target. It only works with KVM, so people can't test it when they don't have access to a real S390 machine. While trying to build and use s390x-softmmu again I stumbled across a couple of issues, all addressed in this patch set. The patches should all not affect non-s390 targets at all. Alexander Graf (3): S390: Add stub for cpu_get_phys_page_debug S390: Tell user why VM creation failed S390: Don't compile in virtio-pci Makefile.objs |3 ++- default-configs/arm-softmmu.mak|1 + default-configs/cris-softmmu.mak |1 + default-configs/i386-softmmu.mak |1 + default-configs/m68k-softmmu.mak |1 + default-configs/microblaze-softmmu.mak |1 + default-configs/mips-softmmu.mak |1 + default-configs/mips64-softmmu.mak |1 + default-configs/mips64el-softmmu.mak |1 + default-configs/mipsel-softmmu.mak |1 + default-configs/ppc-softmmu.mak|1 + default-configs/ppc64-softmmu.mak |1 + default-configs/ppcemb-softmmu.mak |1 + default-configs/sh4-softmmu.mak|1 + default-configs/sh4eb-softmmu.mak |1 + default-configs/sparc-softmmu.mak |1 + default-configs/sparc64-softmmu.mak|1 + default-configs/x86_64-softmmu.mak |1 + kvm-all.c |7 ++- target-s390x/helper.c |5 + 20 files changed, 30 insertions(+), 2 deletions(-)
[Qemu-devel] [PATCH 1/3] S390: Add stub for cpu_get_phys_page_debug
We don't implement any virtual memory in the S390 target so far, so let's add a stub for this now mandatory function. Fixes building of S390 target. Signed-off-by: Alexander Graf ag...@suse.de --- target-s390x/helper.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/target-s390x/helper.c b/target-s390x/helper.c index ba0c052..4a5297b 100644 --- a/target-s390x/helper.c +++ b/target-s390x/helper.c @@ -58,6 +58,11 @@ void cpu_reset(CPUS390XState *env) tlb_flush(env, 1); } +target_phys_addr_t cpu_get_phys_page_debug(CPUState *env, target_ulong addr) +{ +return 0; +} + #ifndef CONFIG_USER_ONLY int cpu_s390x_handle_mmu_fault (CPUState *env, target_ulong address, int rw, -- 1.6.0.2
[Qemu-devel] [PATCH 3/3] S390: Don't compile in virtio-pci
As soon as virtio-pci.c gets compiled and used on S390 the internal qdev magic gets confused and tries to give us PCI devices instead of S390 virtio devices. Since we don't have PCI on S390, we can safely not compile virtio-pci at all. In order to do this I added a new config option CONFIG_PCI that I enabled for every platform except S390. Thanks to this the change should be a complete nop for every other platform. If anyone feels like their platform shouldn't support PCI either, just remove the config option. If you think we should build even less when we don't have PCI, feel free to come up with a follow-up patch. Signed-off-by: Alexander Graf ag...@suse.de --- Makefile.objs |3 ++- default-configs/arm-softmmu.mak|1 + default-configs/cris-softmmu.mak |1 + default-configs/i386-softmmu.mak |1 + default-configs/m68k-softmmu.mak |1 + default-configs/microblaze-softmmu.mak |1 + default-configs/mips-softmmu.mak |1 + default-configs/mips64-softmmu.mak |1 + default-configs/mips64el-softmmu.mak |1 + default-configs/mipsel-softmmu.mak |1 + default-configs/ppc-softmmu.mak|1 + default-configs/ppc64-softmmu.mak |1 + default-configs/ppcemb-softmmu.mak |1 + default-configs/sh4-softmmu.mak|1 + default-configs/sh4eb-softmmu.mak |1 + default-configs/sparc-softmmu.mak |1 + default-configs/sparc64-softmmu.mak|1 + default-configs/x86_64-softmmu.mak |1 + 18 files changed, 19 insertions(+), 1 deletions(-) diff --git a/Makefile.objs b/Makefile.objs index 281f7a6..a6ce4f5 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -128,7 +128,8 @@ user-obj-y += cutils.o cache-utils.o hw-obj-y = hw-obj-y += loader.o -hw-obj-y += virtio.o virtio-console.o virtio-pci.o +hw-obj-y += virtio.o virtio-console.o +hw-obj-$(CONFIG_PCI) += virtio-pci.o hw-obj-y += fw_cfg.o pci.o pci_host.o pcie_host.o hw-obj-y += watchdog.o hw-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak index 02ad192..a3819e1 100644 --- a/default-configs/arm-softmmu.mak +++ b/default-configs/arm-softmmu.mak @@ -24,3 +24,4 @@ CONFIG_SSI_SD=y CONFIG_LAN9118=y CONFIG_SMC91C111=y CONFIG_DS1338=y +CONFIG_PCI=y diff --git a/default-configs/cris-softmmu.mak b/default-configs/cris-softmmu.mak index 8711402..9377235 100644 --- a/default-configs/cris-softmmu.mak +++ b/default-configs/cris-softmmu.mak @@ -2,3 +2,4 @@ CONFIG_NAND=y CONFIG_PTIMER=y +CONFIG_PCI=y diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak index 4dbf656..192dcb8 100644 --- a/default-configs/i386-softmmu.mak +++ b/default-configs/i386-softmmu.mak @@ -16,3 +16,4 @@ CONFIG_IDE_PIIX=y CONFIG_NE2000_ISA=y CONFIG_PIIX_PCI=y CONFIG_SOUND=y +CONFIG_PCI=y diff --git a/default-configs/m68k-softmmu.mak b/default-configs/m68k-softmmu.mak index 0a78375..07e02d6 100644 --- a/default-configs/m68k-softmmu.mak +++ b/default-configs/m68k-softmmu.mak @@ -2,3 +2,4 @@ CONFIG_GDBSTUB_XML=y CONFIG_PTIMER=y +CONFIG_PCI=y diff --git a/default-configs/microblaze-softmmu.mak b/default-configs/microblaze-softmmu.mak index c800c16..ddc3c15 100644 --- a/default-configs/microblaze-softmmu.mak +++ b/default-configs/microblaze-softmmu.mak @@ -1,3 +1,4 @@ # Default configuration for microblaze-softmmu CONFIG_PTIMER=y +CONFIG_PCI=y diff --git a/default-configs/mips-softmmu.mak b/default-configs/mips-softmmu.mak index 345a093..4598a0d 100644 --- a/default-configs/mips-softmmu.mak +++ b/default-configs/mips-softmmu.mak @@ -16,3 +16,4 @@ CONFIG_IDE_ISA=y CONFIG_IDE_PIIX=y CONFIG_NE2000_ISA=y CONFIG_SOUND=y +CONFIG_PCI=y diff --git a/default-configs/mips64-softmmu.mak b/default-configs/mips64-softmmu.mak index 5900ee6..6c7f3c6 100644 --- a/default-configs/mips64-softmmu.mak +++ b/default-configs/mips64-softmmu.mak @@ -16,3 +16,4 @@ CONFIG_IDE_ISA=y CONFIG_IDE_PIIX=y CONFIG_NE2000_ISA=y CONFIG_SOUND=y +CONFIG_PCI=y diff --git a/default-configs/mips64el-softmmu.mak b/default-configs/mips64el-softmmu.mak index 3e1ba93..2dcac82 100644 --- a/default-configs/mips64el-softmmu.mak +++ b/default-configs/mips64el-softmmu.mak @@ -16,3 +16,4 @@ CONFIG_IDE_ISA=y CONFIG_IDE_PIIX=y CONFIG_NE2000_ISA=y CONFIG_SOUND=y +CONFIG_PCI=y diff --git a/default-configs/mipsel-softmmu.mak b/default-configs/mipsel-softmmu.mak index 17b83d0..e3e3878 100644 --- a/default-configs/mipsel-softmmu.mak +++ b/default-configs/mipsel-softmmu.mak @@ -16,3 +16,4 @@ CONFIG_IDE_ISA=y CONFIG_IDE_PIIX=y CONFIG_NE2000_ISA=y CONFIG_SOUND=y +CONFIG_PCI=y diff --git a/default-configs/ppc-softmmu.mak b/default-configs/ppc-softmmu.mak index 5fe591c..50932fa 100644 --- a/default-configs/ppc-softmmu.mak +++ b/default-configs/ppc-softmmu.mak @@ -15,3 +15,4 @@ CONFIG_IDE_ISA=y CONFIG_IDE_CMD646=y CONFIG_NE2000_ISA=y CONFIG_SOUND=y +CONFIG_PCI=y diff --git
[Qemu-devel] [PATCH 2/3] S390: Tell user why VM creation failed
The KVM kernel module on S390 refuses to create a VM when the switch_amode kernel parameter is not used. Since that is not exactly obvious, let's give the user a nice warning. Signed-off-by: Alexander Graf ag...@suse.de --- kvm-all.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 534ead0..acf7e31 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -609,8 +609,13 @@ int kvm_init(int smp_cpus) } s-vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0); -if (s-vmfd 0) +if (s-vmfd 0) { +#ifdef TARGET_S390X +fprintf(stderr, Please add the 'switch_amode' kernel parameter to +your host kernel command line\n); +#endif goto err; +} /* initially, KVM allocated its own memory and we had to jump through * hooks to make phys_ram_base point to this. Modern versions of KVM -- 1.6.0.2
[Qemu-devel] Re: Compile files only once: some planning
On 3/24/10, Juan Quintela quint...@redhat.com wrote: Blue Swirl blauwir...@gmail.com wrote: Hi, Here's some planning for getting most files compiled as few times as possible. Comments and suggestions are welcome. I took some thought about this at some point. Problems here start from Recursive Makefile condered Harmful (tm). Look at how we jump through hops to be able to compile things in one/other side. We have: Makefile Makefile.target (really lots of them, one for target) Makefile.hw Makefile.user If we had only a single Makefile, things in this department would be much, much easier. And no, convert to a single Makefile is not trivial either, but it would make things easier. Why do we have several Makefiles? Because we want to compile each file with different options. Why do we need to abuse so much VPATH? Because we need to bring files randomly from $(ROOT), $(ROOT)/hw $(ROOT)/$(TARGET). Would it help if the Makefiles were scattered to each directory, for example instead of Makefile.hw we had hw/Makefile? Problem here, there isn't a simple way to compile files for several target just once (no way to put them). Our main copmile rule is: $(QEMU_PROG): $(obj-y) $(obj-$(TARGET_BASE_ARCH)-y) $(call LINK,$(obj-y) $(obj-$(TARGET_BASE_ARCH)-y)) (notice that things compiled in Makefile are trivial, they are already compiled just once by definition, problems are for all the qemu's we compile). We could change: $(obj-$(TARGET_BASE_ARCH)-y) to something like: OBJ-TARGET=s/.o/.$(TARGET_BASE_ARCH).o/ (I forgot the subst Makefile syntax), and have the: %.$(TARGET_BASE_ARCH).o: %.c gcc $(TARGET_BASE_ARCH options) From there, as you suggested, we need some files that are not compiled by architecture, they need to be compiled by board, well, we need to add yet another level obj-$(TARGET_BOARD) or whatever. Notice that this is a lot of work, but you are needing the audit to be able to compile only once. Problem just now is that there is not a simple way to describe that information, with my proposal it gets trivial to express: obj-$(CONFIG_FOO) += foo.o # You need this for everything obj-mips-$(CONFIG_FOO) += foo.o # You need this for all mips boards obj-malta-$(CONFIG_FOO) += foo.o # You need this for all mips malta board You still need to do some different magic from hw-32/64 but it could be done this way. Once you did it this way, you now where the files are (hw or target) and you can drop the VPATH tricks. Problem with this proposal is that it is not trivial to do in little steps, and the real big advantages appear when you switch to a single Makefile at the end. I may have missed something, but the compile process doesn't care about boards, because all boards for some architecture (and therefore all devices used by all boards) are linked to a single per-architecture executable. So why introduce the boards concept to Makefiles? vl.c: a lot of work. Maybe the CPUState stuff should be separated to a new file. That should just be a rule in Documentation. You can't but anything else in vl.c. If you move anything out of vl.c (see timers work from bonzini for example), you get a wild card for free commit bypassing maintainers or some similar price :) Cleaning up vl.c would be great, but just for purpose of single compilation, it's enough to put the CPUState pieces to a target dependent file (cpu-common.c?) and compile the rest once by making TARGET_xxx conditional code unconditional. This may still be doable. rest of files I haven't really looked at them at depth. I looked when I cleaned up the build system, I thought how to do the next step (outlined before), but got sidetracked by other more urgent things. Thanks for the comments.
[Qemu-devel] Re: [RFC] vhost-blk implementation
Michael S. Tsirkin wrote: On Tue, Mar 23, 2010 at 12:55:07PM -0700, Badari Pulavarty wrote: Michael S. Tsirkin wrote: On Tue, Mar 23, 2010 at 10:57:33AM -0700, Badari Pulavarty wrote: Michael S. Tsirkin wrote: On Mon, Mar 22, 2010 at 05:34:04PM -0700, Badari Pulavarty wrote: Write Results: == I see degraded IO performance when doing sequential IO write tests with vhost-blk compared to virtio-blk. # time dd of=/dev/vda if=/dev/zero bs=2M oflag=direct I get ~110MB/sec with virtio-blk, but I get only ~60MB/sec with vhost-blk. Wondering why ? Try to look and number of interrupts and/or number of exits. I checked interrupts and IO exits - there is no major noticeable difference between vhost-blk and virtio-blk scenerios. It could also be that you are overrunning some queue. I don't see any exit mitigation strategy in your patch: when there are already lots of requests in a queue, it's usually a good idea to disable notifications and poll the queue as requests complete. That could help performance. Do you mean poll eventfd for new requests instead of waiting for new notifications ? Where do you do that in vhost-net code ? vhost_disable_notify does this. Unlike network socket, since we are dealing with a file, there is no -poll support for it. So I can't poll for the data. And also, Issue I am having is on the write() side. Not sure I understand. I looked at it some more - I see 512K write requests on the virtio-queue in both vhost-blk and virtio-blk cases. Both qemu or vhost is doing synchronous writes to page cache (there is no write batching in qemu that is affecting this case). I still puzzled on why virtio-blk outperforms vhost-blk. Thanks, Badari If you say the number of requests is the same, we are left with: - requests are smaller for some reason? - something is causing retries? No. IO requests sizes are exactly same (512K) in both cases. There are no retries or errors in both cases. One thing I am not clear is - for some reason guest kernel could push more data into virtio-ring in case of virtio-blk vs vhost-blk. Is this possible ? Does guest gets to run much sooner in virtio-blk case than vhost-blk ? Sorry, if its dumb question - I don't understand all the vhost details :( Thanks, Badari BTW, did you put the backend in non-blocking mode? As I said, vhost net passes non-blocking flag to socket backend, but vfs_write/read that you use does not have an option to do this. So we'll need to extend the backend to fix that, but just for demo purposes, you could set non-blocking mode on the file from userspace. Michael, Atleast I understand why the performance difference now.. My debug code is changed the behaviour of virtio-blk which confused me. 1) virtio-blk is able to batch up writes from various requests. 2) virtio-blk offloads the writes to different thread Where as vhost-blk, I do each request syncrhonously. Hence the difference. You are right - i have to make backend async. I will working on handing off work to in-kernel threads. I am not sure about IO completion handling and calling vhost_add_used_and_signal() out of context. But let me take a stab at it and ask your help if I run into issues. Thanks, Badari
Re: [Qemu-devel] Re: Compile files only once: some planning
On 3/24/10, Richard Henderson r...@twiddle.net wrote: On 03/24/2010 02:47 AM, Paolo Bonzini wrote: 1) make CPUState define only common fields. Include CPUState at the beginning of each per-target CPUXYZState. Irritatingly, the common fields contain quite big TLBs. And the offsets from the start of env affect the compactness of the code generated from TCG. We really really want the general registers to come first to make sure that those offsets fit the host's reg+offset addressing mode. One trick is to define a fixed offset (about half CPUState size) and make env point to CPUState + offset. Then the negative part of the offset space can be used efficiently. But this just doubles the space that can be accessed fast, so it's not a big win.
Re: [Qemu-devel] [PATCH 3/3] S390: Don't compile in virtio-pci
On 3/24/10, Alexander Graf ag...@suse.de wrote: As soon as virtio-pci.c gets compiled and used on S390 the internal qdev magic gets confused and tries to give us PCI devices instead of S390 virtio devices. Since we don't have PCI on S390, we can safely not compile virtio-pci at all. In order to do this I added a new config option CONFIG_PCI that I enabled for every platform except S390. Thanks to this the change should be a complete nop for every other platform. The name should be CONFIG_VIRTIO_PCI, CONFIG_PCI would enable compilation of pci.c. If anyone feels like their platform shouldn't support PCI either, just remove the config option. If you think we should build even less when we don't have PCI, feel free to come up with a follow-up patch. None of the currently supported Sparc32 boards have PCI. There are some real devices with PCI (JavaStations) though.
Re: [Qemu-devel] [PATCH 3/3] S390: Don't compile in virtio-pci
Blue Swirl wrote: On 3/24/10, Alexander Graf ag...@suse.de wrote: As soon as virtio-pci.c gets compiled and used on S390 the internal qdev magic gets confused and tries to give us PCI devices instead of S390 virtio devices. Since we don't have PCI on S390, we can safely not compile virtio-pci at all. In order to do this I added a new config option CONFIG_PCI that I enabled for every platform except S390. Thanks to this the change should be a complete nop for every other platform. The name should be CONFIG_VIRTIO_PCI, CONFIG_PCI would enable compilation of pci.c. My idea was to keep the config option generic and rip out all PCI stuff from s390x-softmmu. We don't even do MMIO :-). Alex
[Qemu-devel] Re: [RFC] vhost-blk implementation
On Wed, Mar 24, 2010 at 10:58:50AM -0700, Badari Pulavarty wrote: Michael S. Tsirkin wrote: On Tue, Mar 23, 2010 at 12:55:07PM -0700, Badari Pulavarty wrote: Michael S. Tsirkin wrote: On Tue, Mar 23, 2010 at 10:57:33AM -0700, Badari Pulavarty wrote: Michael S. Tsirkin wrote: On Mon, Mar 22, 2010 at 05:34:04PM -0700, Badari Pulavarty wrote: Write Results: == I see degraded IO performance when doing sequential IO write tests with vhost-blk compared to virtio-blk. # time dd of=/dev/vda if=/dev/zero bs=2M oflag=direct I get ~110MB/sec with virtio-blk, but I get only ~60MB/sec with vhost-blk. Wondering why ? Try to look and number of interrupts and/or number of exits. I checked interrupts and IO exits - there is no major noticeable difference between vhost-blk and virtio-blk scenerios. It could also be that you are overrunning some queue. I don't see any exit mitigation strategy in your patch: when there are already lots of requests in a queue, it's usually a good idea to disable notifications and poll the queue as requests complete. That could help performance. Do you mean poll eventfd for new requests instead of waiting for new notifications ? Where do you do that in vhost-net code ? vhost_disable_notify does this. Unlike network socket, since we are dealing with a file, there is no -poll support for it. So I can't poll for the data. And also, Issue I am having is on the write() side. Not sure I understand. I looked at it some more - I see 512K write requests on the virtio-queue in both vhost-blk and virtio-blk cases. Both qemu or vhost is doing synchronous writes to page cache (there is no write batching in qemu that is affecting this case). I still puzzled on why virtio-blk outperforms vhost-blk. Thanks, Badari If you say the number of requests is the same, we are left with: - requests are smaller for some reason? - something is causing retries? No. IO requests sizes are exactly same (512K) in both cases. There are no retries or errors in both cases. One thing I am not clear is - for some reason guest kernel could push more data into virtio-ring in case of virtio-blk vs vhost-blk. Is this possible ? Does guest gets to run much sooner in virtio-blk case than vhost-blk ? Sorry, if its dumb question - I don't understand all the vhost details :( Thanks, Badari BTW, did you put the backend in non-blocking mode? As I said, vhost net passes non-blocking flag to socket backend, but vfs_write/read that you use does not have an option to do this. So we'll need to extend the backend to fix that, but just for demo purposes, you could set non-blocking mode on the file from userspace. Michael, Atleast I understand why the performance difference now.. My debug code is changed the behaviour of virtio-blk which confused me. 1) virtio-blk is able to batch up writes from various requests. 2) virtio-blk offloads the writes to different thread Where as vhost-blk, I do each request syncrhonously. Hence the difference. You are right - i have to make backend async. I will working on handing off work to in-kernel threads. I am not sure about IO completion handling and calling vhost_add_used_and_signal() out of context. But let me take a stab at it and ask your help if I run into issues. Thanks, Badari The way I did it for vhost net, requests are synchronous but non-blocking. So if it can't be done directly, I delay it. -- MST
Re: [Qemu-devel] [PATCH 3/3] S390: Don't compile in virtio-pci
On 3/24/10, Alexander Graf ag...@suse.de wrote: Blue Swirl wrote: On 3/24/10, Alexander Graf ag...@suse.de wrote: As soon as virtio-pci.c gets compiled and used on S390 the internal qdev magic gets confused and tries to give us PCI devices instead of S390 virtio devices. Since we don't have PCI on S390, we can safely not compile virtio-pci at all. In order to do this I added a new config option CONFIG_PCI that I enabled for every platform except S390. Thanks to this the change should be a complete nop for every other platform. The name should be CONFIG_VIRTIO_PCI, CONFIG_PCI would enable compilation of pci.c. My idea was to keep the config option generic and rip out all PCI stuff from s390x-softmmu. We don't even do MMIO :-). In that case you should also rip out all PCI cards.
[Qemu-devel] Re: [PATCH 0/4] monitor: Convert do_set_link() to QObject, QError
On Tue, 23 Mar 2010 11:27:54 +0100 Markus Armbruster arm...@redhat.com wrote: PATCH 3/4 changes syntax of set_link's second argument from up|down to on|off. I feel that the argument needs to be boolean in QMP, and this is the simplest way to get it. Alternatives I could try if the syntax change is unwanted: * Use the old string argument in QMP. Easy. * Don't convert set_link, create a new command with a boolean argument. * Create a argument parser for up|down. I like your approach. Daniel do you use set_link in libvirt already? I've grepped around I didn't found any reference for it. Markus Armbruster (4): monitor: Rename argument type 'b' to 'f' monitor: New argument type 'b' monitor: Use argument type 'b' for set_link monitor: Convert do_set_link() to QObject, QError monitor.c | 39 +++ net.c | 17 ++--- net.h |2 +- qemu-monitor.hx | 13 +++-- 4 files changed, 49 insertions(+), 22 deletions(-)
[Qemu-devel] [PATCH 2/2] machine opts framework
This patch adds initial support for the -machine option, that allows command line specification of machine attributes (always relying on safe defaults). Besides its value per-se, it is the saner way we found to allow for enabling/disabling of kvm's in-kernel irqchip. A machine with in-kernel-irqchip could be specified as: -machine irqchip=apic-kvm And one without it: -machine irqchip=apic To demonstrate how it'd work, this patch introduces a choice between pic and apic, pic being the old-style isa thing. --- hw/boards.h | 10 ++ hw/pc.c | 45 +++-- qemu-config.c | 16 qemu-config.h |1 + qemu-options.hx |9 + vl.c| 54 ++ 6 files changed, 129 insertions(+), 6 deletions(-) diff --git a/hw/boards.h b/hw/boards.h index 6f0f0d7..831728c 100644 --- a/hw/boards.h +++ b/hw/boards.h @@ -12,6 +12,15 @@ typedef void QEMUMachineInitFunc(ram_addr_t ram_size, const char *initrd_filename, const char *cpu_model); +typedef void (QEMUIrqchipFunc)(void *opaque); + +typedef struct QEMUIrqchip { +const char *name; +QEMUIrqchipFunc *init; +int used; +int is_default; +} QEMUIrqchip; + typedef struct QEMUMachine { const char *name; const char *alias; @@ -28,6 +37,7 @@ typedef struct QEMUMachine { no_sdcard:1; int is_default; GlobalProperty *compat_props; +QEMUIrqchip *irqchip; struct QEMUMachine *next; } QEMUMachine; diff --git a/hw/pc.c b/hw/pc.c index ba14df0..43ec022 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -752,21 +752,43 @@ int cpu_is_bsp(CPUState *env) return env-cpu_index == 0; } +static void qemu_apic_init(void *opaque) +{ +CPUState *env = opaque; +if (!(env-cpuid_features CPUID_APIC)) { +fprintf(stderr, CPU lacks APIC cpuid flag\n); +exit(1); +} +env-cpuid_apic_id = env-cpu_index; +/* APIC reset callback resets cpu */ +apic_init(env); +} + +static void qemu_pic_init(void *opaque) +{ +CPUState *env = opaque; + +if (smp_cpus 1) { +fprintf(stderr, PIC can't support smp systems\n); +exit(1); +} +qemu_register_reset((QEMUResetHandler*)cpu_reset, env); +} + static CPUState *pc_new_cpu(const char *cpu_model) { CPUState *env; +QEMUIrqchip *ic; env = cpu_init(cpu_model); if (!env) { fprintf(stderr, Unable to find x86 CPU definition\n); exit(1); } -if ((env-cpuid_features CPUID_APIC) || smp_cpus 1) { -env-cpuid_apic_id = env-cpu_index; -/* APIC reset callback resets cpu */ -apic_init(env); -} else { -qemu_register_reset((QEMUResetHandler*)cpu_reset, env); + +for (ic = current_machine-irqchip; ic-name != NULL; ic++) { +if (ic-used) +ic-init(env); } return env; } @@ -1074,6 +1096,17 @@ static QEMUMachine pc_machine = { .desc = Standard PC, .init = pc_init_pci, .max_cpus = 255, +.irqchip = (QEMUIrqchip[]){ +{ +.name = apic, +.init = qemu_apic_init, +.is_default = 1, +},{ +.name = pic, +.init = qemu_pic_init, +}, +{ /* end of list */ }, +}, .is_default = 1, }; diff --git a/qemu-config.c b/qemu-config.c index 150157c..2b985a9 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -296,6 +296,21 @@ QemuOptsList qemu_cpudef_opts = { }, }; +QemuOptsList qemu_machine_opts = { +.name = M, +.head = QTAILQ_HEAD_INITIALIZER(qemu_machine_opts.head), +.desc = { +{ +.name = mach, +.type = QEMU_OPT_STRING, +},{ +.name = irqchip, +.type = QEMU_OPT_STRING, +}, +{ /* end of list */ } +}, +}; + static QemuOptsList *lists[] = { qemu_drive_opts, qemu_chardev_opts, @@ -306,6 +321,7 @@ static QemuOptsList *lists[] = { qemu_global_opts, qemu_mon_opts, qemu_cpudef_opts, +qemu_machine_opts, NULL, }; diff --git a/qemu-config.h b/qemu-config.h index f217c58..ea302f0 100644 --- a/qemu-config.h +++ b/qemu-config.h @@ -10,6 +10,7 @@ extern QemuOptsList qemu_rtc_opts; extern QemuOptsList qemu_global_opts; extern QemuOptsList qemu_mon_opts; extern QemuOptsList qemu_cpudef_opts; +extern QemuOptsList qemu_machine_opts; QemuOptsList *qemu_find_opts(const char *group); int qemu_set_option(const char *str); diff --git a/qemu-options.hx b/qemu-options.hx index 8450b45..585ecd2 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -34,6 +34,15 @@ STEXI Select the emulated @var{machine} (@code{-M ?} for list) ETEXI +DEF(machine, HAS_ARG, QEMU_OPTION_machine, +-machine mach=m[,irqchip=chip]\n +select emulated machine (-machine ? for list)\n) +STEXI +...@item -machine
[Qemu-devel] [PATCH 1/2] early set current_machine
this way, the machine_init function itself can know which machine is current in use, not only the late init code. --- vl.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/vl.c b/vl.c index d69250c..ceddeac 100644 --- a/vl.c +++ b/vl.c @@ -4841,6 +4841,9 @@ int main(int argc, char **argv, char **envp) } qemu_add_globals(); + +current_machine = machine; + machine-init(ram_size, boot_devices, kernel_filename, kernel_cmdline, initrd_filename, cpu_model); @@ -4859,8 +4862,6 @@ int main(int argc, char **argv, char **envp) } } -current_machine = machine; - /* init USB devices */ if (usb_enabled) { if (foreach_device_config(DEV_USB, usb_parse) 0) -- 1.6.6
[Qemu-devel] Re: Compile files only once: some planning
Blue Swirl blauwir...@gmail.com wrote: On 3/24/10, Juan Quintela quint...@redhat.com wrote: Blue Swirl blauwir...@gmail.com wrote: Hi, Here's some planning for getting most files compiled as few times as possible. Comments and suggestions are welcome. I took some thought about this at some point. Problems here start from Recursive Makefile condered Harmful (tm). Look at how we jump through hops to be able to compile things in one/other side. We have: Makefile Makefile.target (really lots of them, one for target) Makefile.hw Makefile.user If we had only a single Makefile, things in this department would be much, much easier. And no, convert to a single Makefile is not trivial either, but it would make things easier. Why do we have several Makefiles? Because we want to compile each file with different options. Why do we need to abuse so much VPATH? Because we need to bring files randomly from $(ROOT), $(ROOT)/hw $(ROOT)/$(TARGET). Would it help if the Makefiles were scattered to each directory, for example instead of Makefile.hw we had hw/Makefile? The interesting one is Makefile.target, hw/Makefile could help, but that is far away from where the action is. If you look at Makefile.target from far enough, you will see that it basically has: libobj-$(CONFIG_FOO) = ... ifdef CONFIG_LINUX_USER endif ifdef CONFIG_DARWIN_USER ... endif ifdef CONFIG_BSD_USER ... endif ifdef CONFIG_SOFTMMU ... endif The shared bits are very small (out of the libobj-y stuff). Spliting the others in different Makefiles (or whatever is easy). How to get this ones compiled only once per BASE_ARCH/whatever should put us near the goal of a single Makefile (and compiling each thing just the number of times required). Some thought is needed to know how to work here. Actually, Anthony suggested at some point to just use 64 bits for TARGET_PHYS_ADDR_BITS and remove the need for hw32/64. I think that people emulationg 32bits on 32bits would suffer, but have no clue how much. Anthony, what was the idea? Problem with this proposal is that it is not trivial to do in little steps, and the real big advantages appear when you switch to a single Makefile at the end. I may have missed something, but the compile process doesn't care about boards, because all boards for some architecture (and therefore all devices used by all boards) are linked to a single per-architecture executable. So why introduce the boards concept to Makefiles? I missunderstood this bit of your previous message: The target dependent cases should be next. On full build, each MIPS device file gets compiled four times, PPC files three times and x86 twice. The devices for architectures that are compiled only once (ARM, Cris, Sparc32 etc.) do not need to be touched. I was refering to this ones, but somehow got confused with boards :( vl.c: a lot of work. Maybe the CPUState stuff should be separated to a new file. That should just be a rule in Documentation. You can't but anything else in vl.c. If you move anything out of vl.c (see timers work from bonzini for example), you get a wild card for free commit bypassing maintainers or some similar price :) Cleaning up vl.c would be great, but just for purpose of single compilation, it's enough to put the CPUState pieces to a target dependent file (cpu-common.c?) and compile the rest once by making TARGET_xxx conditional code unconditional. This may still be doable. I haven't looked at detail at this :( rest of files I haven't really looked at them at depth. I looked when I cleaned up the build system, I thought how to do the next step (outlined before), but got sidetracked by other more urgent things. Thanks for the comments. You are welcome. Later, Juan.
[Qemu-devel] Re: [PATCH 2/4] monitor: New argument type 'b'
On Tue, 23 Mar 2010 11:27:56 +0100 Markus Armbruster arm...@redhat.com wrote: This is a boolean value. Human monitor accepts on or off. Consistent with option parsing (see parse_option_bool()). Signed-off-by: Markus Armbruster arm...@redhat.com --- monitor.c | 31 +++ 1 files changed, 31 insertions(+), 0 deletions(-) diff --git a/monitor.c b/monitor.c index 3ce9a4e..47b68a2 100644 --- a/monitor.c +++ b/monitor.c @@ -85,6 +85,8 @@ * * '?' optional type (for all types, except '/') * '.' other form of optional type (for 'i' and 'l') + * 'b' boolean + * user mode accepts on or off * '-' optional parameter (eg. '-f') * */ @@ -3841,6 +3843,29 @@ static const mon_cmd_t *monitor_parse_command(Monitor *mon, qdict_put(qdict, key, qfloat_from_double(val)); } break; +case 'b': +{ +const char *beg; +int val; + +while (qemu_isspace(*p)) { +p++; +} +beg = p; +while (qemu_isgraph(*p)) { +p++; +} +if (!strncmp(beg, on, p - beg)) { +val = 1; +} else if (!strncmp(beg, off, p - beg)) { +val = 0; +} else { +monitor_printf(mon, Expected 'on' or 'off'\n); +goto fail; +} This will make 'on' be the default when no on/off is specified, is that your intention? I'm wondering if this can cause problems when you add optional support for it and mixes it with other arguments. +qdict_put(qdict, key, qbool_from_int(val)); +} +break; case '-': { const char *tmp = p; @@ -4322,6 +4347,12 @@ static int check_arg(const CmdArgs *cmd_args, QDict *args) return -1; } break; +case 'b': +if (qobject_type(value) != QTYPE_QBOOL) { +qerror_report(QERR_INVALID_PARAMETER_TYPE, name, bool); +return -1; +} +break; case '-': if (qobject_type(value) != QTYPE_QINT qobject_type(value) != QTYPE_QBOOL) {
[Qemu-devel] Re: [PATCH 3/3] monitor: Convert do_migrate() to QError
On Tue, 23 Mar 2010 19:07:21 +0100 Markus Armbruster arm...@redhat.com wrote: Human monitor error message changes from unknown migration protocol: FOO to Invalid parameter uri. The conversion is shallow: the FOO_start_outgoing_migration() aren't converted. Converting them is a big job for relatively little practical benefit, so leave it for later. Signed-off-by: Markus Armbruster arm...@redhat.com --- migration.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/migration.c b/migration.c index 05f6cc5..47d2ab5 100644 --- a/migration.c +++ b/migration.c @@ -56,14 +56,14 @@ void qemu_start_incoming_migration(const char *uri) int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data) { -MigrationState *s = NULL; +MigrationState *s; const char *p; int detach = qdict_get_int(qdict, detach); const char *uri = qdict_get_str(qdict, uri); if (current_migration current_migration-get_status(current_migration) == MIG_STATE_ACTIVE) { -monitor_printf(mon, migration already in progress\n); +qerror_report(QERR_MIGRATION_IN_PROGRESS); return -1; } What about QERR_OPERATION_IN_PROGRESS? So that we have: Operation already in progress: migration. @@ -86,12 +86,13 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data) (int)qdict_get_int(qdict, inc)); #endif } else { -monitor_printf(mon, unknown migration protocol: %s\n, uri); +qerror_report(QERR_INVALID_PARAMETER, uri); return -1; } if (s == NULL) { -monitor_printf(mon, migration failed\n); +/* TODO push error reporting into the FOO_start_outgoing_migration() */ +qerror_report(QERR_MIGRATION_FAILED); return -1; } I think this one is no better than the automatic UndefinedError which is going to be triggered. I would only touch this when/if we get the migration functions converted.
Re: [Qemu-devel] [PATCH 2/2] machine opts framework
On 03/24/2010 02:26 PM, Glauber Costa wrote: This patch adds initial support for the -machine option, that allows command line specification of machine attributes (always relying on safe defaults). Besides its value per-se, it is the saner way we found to allow for enabling/disabling of kvm's in-kernel irqchip. A machine with in-kernel-irqchip could be specified as: -machine irqchip=apic-kvm And one without it: -machine irqchip=apic To demonstrate how it'd work, this patch introduces a choice between pic and apic, pic being the old-style isa thing. I started from a different place. See machine-qemuopts in my staging tree. I think we should combine efforts. Regards, Anthony Liguori --- hw/boards.h | 10 ++ hw/pc.c | 45 +++-- qemu-config.c | 16 qemu-config.h |1 + qemu-options.hx |9 + vl.c| 54 ++ 6 files changed, 129 insertions(+), 6 deletions(-) diff --git a/hw/boards.h b/hw/boards.h index 6f0f0d7..831728c 100644 --- a/hw/boards.h +++ b/hw/boards.h @@ -12,6 +12,15 @@ typedef void QEMUMachineInitFunc(ram_addr_t ram_size, const char *initrd_filename, const char *cpu_model); +typedef void (QEMUIrqchipFunc)(void *opaque); + +typedef struct QEMUIrqchip { +const char *name; +QEMUIrqchipFunc *init; +int used; +int is_default; +} QEMUIrqchip; + typedef struct QEMUMachine { const char *name; const char *alias; @@ -28,6 +37,7 @@ typedef struct QEMUMachine { no_sdcard:1; int is_default; GlobalProperty *compat_props; +QEMUIrqchip *irqchip; struct QEMUMachine *next; } QEMUMachine; diff --git a/hw/pc.c b/hw/pc.c index ba14df0..43ec022 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -752,21 +752,43 @@ int cpu_is_bsp(CPUState *env) return env-cpu_index == 0; } +static void qemu_apic_init(void *opaque) +{ +CPUState *env = opaque; +if (!(env-cpuid_features CPUID_APIC)) { +fprintf(stderr, CPU lacks APIC cpuid flag\n); +exit(1); +} +env-cpuid_apic_id = env-cpu_index; +/* APIC reset callback resets cpu */ +apic_init(env); +} + +static void qemu_pic_init(void *opaque) +{ +CPUState *env = opaque; + +if (smp_cpus 1) { +fprintf(stderr, PIC can't support smp systems\n); +exit(1); +} +qemu_register_reset((QEMUResetHandler*)cpu_reset, env); +} + static CPUState *pc_new_cpu(const char *cpu_model) { CPUState *env; +QEMUIrqchip *ic; env = cpu_init(cpu_model); if (!env) { fprintf(stderr, Unable to find x86 CPU definition\n); exit(1); } -if ((env-cpuid_features CPUID_APIC) || smp_cpus 1) { -env-cpuid_apic_id = env-cpu_index; -/* APIC reset callback resets cpu */ -apic_init(env); -} else { -qemu_register_reset((QEMUResetHandler*)cpu_reset, env); + +for (ic = current_machine-irqchip; ic-name != NULL; ic++) { +if (ic-used) +ic-init(env); } return env; } @@ -1074,6 +1096,17 @@ static QEMUMachine pc_machine = { .desc = Standard PC, .init = pc_init_pci, .max_cpus = 255, +.irqchip = (QEMUIrqchip[]){ +{ +.name = apic, +.init = qemu_apic_init, +.is_default = 1, +},{ +.name = pic, +.init = qemu_pic_init, +}, +{ /* end of list */ }, +}, .is_default = 1, }; diff --git a/qemu-config.c b/qemu-config.c index 150157c..2b985a9 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -296,6 +296,21 @@ QemuOptsList qemu_cpudef_opts = { }, }; +QemuOptsList qemu_machine_opts = { +.name = M, +.head = QTAILQ_HEAD_INITIALIZER(qemu_machine_opts.head), +.desc = { +{ +.name = mach, +.type = QEMU_OPT_STRING, +},{ +.name = irqchip, +.type = QEMU_OPT_STRING, +}, +{ /* end of list */ } +}, +}; + static QemuOptsList *lists[] = { qemu_drive_opts, qemu_chardev_opts, @@ -306,6 +321,7 @@ static QemuOptsList *lists[] = { qemu_global_opts, qemu_mon_opts, qemu_cpudef_opts, +qemu_machine_opts, NULL, }; diff --git a/qemu-config.h b/qemu-config.h index f217c58..ea302f0 100644 --- a/qemu-config.h +++ b/qemu-config.h @@ -10,6 +10,7 @@ extern QemuOptsList qemu_rtc_opts; extern QemuOptsList qemu_global_opts; extern QemuOptsList qemu_mon_opts; extern QemuOptsList qemu_cpudef_opts; +extern QemuOptsList qemu_machine_opts; QemuOptsList *qemu_find_opts(const char *group); int qemu_set_option(const char *str); diff --git a/qemu-options.hx b/qemu-options.hx index 8450b45..585ecd2 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -34,6 +34,15 @@ STEXI Select the
[Qemu-devel] Re: [PATCH] update bochs vbe interface
On 03/24/10 18:04, Juan Quintela wrote: Gerd Hoffmannkra...@redhat.com wrote: The bochs vbe interface got a new register a while back, which specifies the linear framebuffer size in 64k units. This patch adds support for the new register to qemu. With this patch applied vgabios 0.6c works with qemu. Signed-off-by: Gerd Hoffmannkra...@redhat.com It breaks migration. vga.c:2218:VMSTATE_UINT16_ARRAY(vbe_regs, VGACommonState, VBE_DISPI_INDEX_NB), 2218 is the interesting line. Can we freeze this patch until the subsections stuff is done? Yea, I see. Well, the new register doesn't carry any state, it is read-only information for the guest. So the easy way out is to simply not save it as we don't have to. cheers, Gerd
[Qemu-devel] [PATCH v2] update bochs vbe interface
The bochs vbe interface got a new register a while back, which specifies the linear framebuffer size in 64k units. This patch adds support for the new register to qemu. With this patch applied vgabios 0.6c works with qemu. [ v2: Don't savevm the new register. Doing so breaks migration, and as it carries read-only information for the guest there is no need to save it. ] --- hw/vga.c |5 +++-- hw/vga_int.h |6 +- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/hw/vga.c b/hw/vga.c index 6a1a059..f9e07cf 100644 --- a/hw/vga.c +++ b/hw/vga.c @@ -1955,7 +1955,8 @@ void vga_common_reset(VGACommonState *s) #ifdef CONFIG_BOCHS_VBE s-vbe_index = 0; memset(s-vbe_regs, '\0', sizeof(s-vbe_regs)); -s-vbe_regs[VBE_DISPI_INDEX_ID] = VBE_DISPI_ID0; +s-vbe_regs[VBE_DISPI_INDEX_ID] = VBE_DISPI_ID5; +s-vbe_regs[VBE_DISPI_INDEX_VIDEO_MEMORY_64K] = s-vram_size / (64 * 1024); s-vbe_start_addr = 0; s-vbe_line_offset = 0; s-vbe_bank_mask = (s-vram_size 16) - 1; @@ -2215,7 +2216,7 @@ const VMStateDescription vmstate_vga_common = { VMSTATE_UINT8_EQUAL(is_vbe_vmstate, VGACommonState), #ifdef CONFIG_BOCHS_VBE VMSTATE_UINT16(vbe_index, VGACommonState), -VMSTATE_UINT16_ARRAY(vbe_regs, VGACommonState, VBE_DISPI_INDEX_NB), +VMSTATE_UINT16_ARRAY(vbe_regs, VGACommonState, VBE_DISPI_INDEX_NB_VMSTATE), VMSTATE_UINT32(vbe_start_addr, VGACommonState), VMSTATE_UINT32(vbe_line_offset, VGACommonState), VMSTATE_UINT32(vbe_bank_mask, VGACommonState), diff --git a/hw/vga_int.h b/hw/vga_int.h index 23a42ef..c3c5e21 100644 --- a/hw/vga_int.h +++ b/hw/vga_int.h @@ -47,13 +47,17 @@ #define VBE_DISPI_INDEX_VIRT_HEIGHT 0x7 #define VBE_DISPI_INDEX_X_OFFSET0x8 #define VBE_DISPI_INDEX_Y_OFFSET0x9 -#define VBE_DISPI_INDEX_NB 0xa +#define VBE_DISPI_INDEX_VIDEO_MEMORY_64K 0xa + +#define VBE_DISPI_INDEX_NB_VMSTATE 0xa +#define VBE_DISPI_INDEX_NB 0xb #define VBE_DISPI_ID0 0xB0C0 #define VBE_DISPI_ID1 0xB0C1 #define VBE_DISPI_ID2 0xB0C2 #define VBE_DISPI_ID3 0xB0C3 #define VBE_DISPI_ID4 0xB0C4 +#define VBE_DISPI_ID5 0xB0C5 #define VBE_DISPI_DISABLED 0x00 #define VBE_DISPI_ENABLED 0x01 -- 1.6.6.1
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
On 03/24/2010 06:42 PM, Luiz Capitulino wrote: On Wed, 24 Mar 2010 12:42:16 +0200 Avi Kivitya...@redhat.com wrote: So, at best qemud is a toy for people who are annoyed by libvirt. Is the reason for doing this in qemu because libvirt is annoying? Mostly. I don't see how adding yet another layer/daemon is going to improve ours and user's life (the same applies for libqemu). libvirt becomes optional. If I got it right, there were two complaints from the kvm-devel flamewar: 1. Qemu has usability problems 2. There's no way an external tool can get /proc/kallsyms info from Qemu I don't see how libqemu can help with 1) and having qemud doesn't seem the best solution for 2) either. Still talking about 2), what's wrong in getting the PID or having a QMP connection in a well known location as suggested by Anthony? I now believe that's the best option. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic.
[Qemu-devel] Re: Compile files only once: some planning
On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote: rtl8139.c, e1000.c: need to convert ldl/stl to cpu_physical_memory_read/write. I don't see how it would help. These still get target_phys_addr_t which is per-target. Further, a ton of devices do cpu_register_physical_memory/qemu_register_coalesced_mmio. These are also per target. I don't know what I was eating yesterday: there are no references to ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple for the device itself, just add a property be. The attached patch performs this part. But now there is a bigger problem, how to pass the property to the device. It's not fair to require the user to remember to set it. A simple solution would be to change all of cpu_XX functions to get a 64 bit address. This is a lot of churn, if we do this anyway we should also pass length to callbacks, this way rwhandler will get very small or go away completely. It's not too much effort to keep the target_phys_addr_t type. From e0ab5cc41c68207be558ccb330f4fb83fba4ee6f Mon Sep 17 00:00:00 2001 From: Blue Swirl blauwir...@gmail.com Date: Wed, 24 Mar 2010 19:54:05 + Subject: [PATCH] Compile rtl8139 and e1000 only once WIP Signed-off-by: Blue Swirl blauwir...@gmail.com --- Makefile.objs |2 + Makefile.target |4 -- hw/e1000.c | 108 ++ hw/rtl8139.c| 82 +++--- 4 files changed, 147 insertions(+), 49 deletions(-) diff --git a/Makefile.objs b/Makefile.objs index 281f7a6..54895f8 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -155,6 +155,8 @@ hw-obj-y += msix.o hw-obj-y += ne2000.o hw-obj-y += eepro100.o hw-obj-y += pcnet.o +hw-obj-y += rtl8139.o +hw-obj-y += e1000.o hw-obj-$(CONFIG_SMC91C111) += smc91c111.o hw-obj-$(CONFIG_LAN9118) += lan9118.o diff --git a/Makefile.target b/Makefile.target index eb4d010..1a86fc4 100644 --- a/Makefile.target +++ b/Makefile.target @@ -176,10 +176,6 @@ QEMU_CFLAGS += $(VNC_SASL_CFLAGS) # xen backend driver support obj-$(CONFIG_XEN) += xen_machine_pv.o xen_domainbuild.o -# PCI network cards -obj-y += rtl8139.o -obj-y += e1000.o - # Hardware support obj-i386-y = ide/core.o obj-i386-y += pckbd.o dma.o diff --git a/hw/e1000.c b/hw/e1000.c index fd3059a..0f72db8 100644 --- a/hw/e1000.c +++ b/hw/e1000.c @@ -121,6 +121,7 @@ typedef struct E1000State_st { uint16_t reading; uint32_t old_eecd; } eecd_state; +uint32_t be; } E1000State; #define defreg(x) x = (E1000_##x2) @@ -825,14 +826,11 @@ static void (*macreg_writeops[])(E1000State *, int, uint32_t) = { enum { NWRITEOPS = ARRAY_SIZE(macreg_writeops) }; static void -e1000_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val) +e1000_mmio_writel_le(void *opaque, target_phys_addr_t addr, uint32_t val) { E1000State *s = opaque; unsigned int index = (addr 0x1) 2; -#ifdef TARGET_WORDS_BIGENDIAN -val = bswap32(val); -#endif if (index NWRITEOPS macreg_writeops[index]) macreg_writeops[index](s, index, val); else if (index NREADOPS macreg_readops[index]) @@ -841,25 +839,47 @@ e1000_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val) DBGOUT(UNKNOWN, MMIO unknown write addr=0x%08x,val=0x%08x\n, index2, val); } +static void +e1000_mmio_writel_be(void *opaque, target_phys_addr_t addr, uint32_t val) +{ +val = bswap32(val); +e1000_mmio_writel_le(opaque, addr, val); +} static void -e1000_mmio_writew(void *opaque, target_phys_addr_t addr, uint32_t val) +e1000_mmio_writew_le(void *opaque, target_phys_addr_t addr, uint32_t val) { // emulate hw without byte enables: no RMW -e1000_mmio_writel(opaque, addr ~3, - (val 0x) (8*(addr 3))); +e1000_mmio_writel_le(opaque, addr ~3, + (val 0x) (8*(addr 3))); } static void -e1000_mmio_writeb(void *opaque, target_phys_addr_t addr, uint32_t val) +e1000_mmio_writew_be(void *opaque, target_phys_addr_t addr, uint32_t val) { // emulate hw without byte enables: no RMW -e1000_mmio_writel(opaque, addr ~3, - (val 0xff) (8*(addr 3))); +e1000_mmio_writel_be(opaque, addr ~3, + (val 0x) (8*(addr 3))); +} + +static void +e1000_mmio_writeb_be(void *opaque, target_phys_addr_t addr, uint32_t val) +{ +// emulate hw without byte enables: no RMW +e1000_mmio_writel_be(opaque, addr ~3, + (val 0xff) (8*(addr 3))); +} + +static void +e1000_mmio_writeb_le(void *opaque, target_phys_addr_t addr, uint32_t val) +{ +// emulate hw without byte enables: no RMW +e1000_mmio_writel_le(opaque, addr ~3, + (val 0xff) (8*(addr 3))); } static uint32_t -e1000_mmio_readl(void *opaque, target_phys_addr_t addr) +e1000_mmio_readl_le(void
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
On Wed, 24 Mar 2010 21:49:45 +0200 Avi Kivity a...@redhat.com wrote: On 03/24/2010 06:42 PM, Luiz Capitulino wrote: On Wed, 24 Mar 2010 12:42:16 +0200 Avi Kivitya...@redhat.com wrote: So, at best qemud is a toy for people who are annoyed by libvirt. Is the reason for doing this in qemu because libvirt is annoying? Mostly. I don't see how adding yet another layer/daemon is going to improve ours and user's life (the same applies for libqemu). libvirt becomes optional. I think it should only be optional if all you want is to run a single VM in this case what seems to be missing on our side is a _real_ GUI, bundled with QEMU potentially written in a high-level language. Then we make virt-manager optional and this is good because we can sync features way faster and we don't have to care about _managing_ several VMs, our world in terms of usability and maintainability is about one VM. IMVHO, everything else should be done by third-party tools like libvirt, we just provide the means for it.
Re: [Qemu-devel] Re: Compile files only once: some planning
On 03/24/2010 10:07 AM, Richard Henderson wrote: struct CPUSmallCommonState { // most of the stuff from CPU_COMMON. // sorted for some thought of padding elimination. ;-) }; struct CPULargeCommonState { CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE]; target_phys_addr_t iotlb[NB_MMU_MODES][CPU_TLB_SIZE]; struct TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE]; jmp_buf jmp_env; }; ... Now. If you're compiling a file for which cpu-specific code is ok: register CPUXYZLargeState *env __asm__(AREG0); #define ENV_SMALL_COMMON_STATE(env-s.common_s) #define ENV_LARGE_COMMON_STATE(env-common_l) If you're compiling a file which is supposed to be independant of cpu: register CPUSmallCommonState *env __asm__(AREG0); #define ENV_SMALL_COMMON_STATE(env) #define ENV_LARGE_COMMON_STATE((CPULargeCommonState *)((char *)env + cpu_large_state_offset)) On 03/24/2010 11:00 AM, Blue Swirl wrote: One trick is to define a fixed offset (about half CPUState size) and make env point to CPUState + offset. Then the negative part of the offset space can be used efficiently. But this just doubles the space that can be accessed fast, so it's not a big win. A good idea. We can eliminate the cpu_large_state_offset from above via: struct CPUSmallCommonState { // most of the stuff from CPU_COMMON. } __attribute__((aligned)); struct CPUXYZPrivateState { // all the cpu-specific stuff }; struct CPUXYZSmallState { CPUXYZPrivateState p; CPUSmallCommonState s; }; struct CPUXYZAllState { CPUXYZSmallState s; CPULargeCommonState l; // ARG0 register points here. }; register void *biased_env __asm__(AREG0); static inline CPUXYZPrivateState *get_env_cpu_private(void) { return ((CPUXYZSmallState *)biased_env - 1)-p; } static inline CPUSmallCommonState *get_env_common_small(void) { return (CPUSmallCommonState *)biased_env - 1; } static inline CPULargeCommonState *get_env_common_large(void) { return (CPULargeCommonState *)biased_env; } r~
[Qemu-devel] Re: Compile files only once: some planning
On Wed, Mar 24, 2010 at 10:05:10PM +0200, Blue Swirl wrote: On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote: rtl8139.c, e1000.c: need to convert ldl/stl to cpu_physical_memory_read/write. I don't see how it would help. These still get target_phys_addr_t which is per-target. Further, a ton of devices do cpu_register_physical_memory/qemu_register_coalesced_mmio. These are also per target. I don't know what I was eating yesterday: there are no references to ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple for the device itself, just add a property be. The attached patch performs this part. But now there is a bigger problem, how to pass the property to the device. It's not fair to require the user to remember to set it. I still don't fully understand how come e1000 cares about target endianness. A simple solution would be to change all of cpu_XX functions to get a 64 bit address. This is a lot of churn, if we do this anyway we should also pass length to callbacks, this way rwhandler will get very small or go away completely. It's not too much effort to keep the target_phys_addr_t type. I don't understand - target_phys_addr_t is different for different targets to we will need to recompile the code for each target. What am I missing? From e0ab5cc41c68207be558ccb330f4fb83fba4ee6f Mon Sep 17 00:00:00 2001 From: Blue Swirl blauwir...@gmail.com Date: Wed, 24 Mar 2010 19:54:05 + Subject: [PATCH] Compile rtl8139 and e1000 only once WIP Signed-off-by: Blue Swirl blauwir...@gmail.com --- Makefile.objs |2 + Makefile.target |4 -- hw/e1000.c | 108 ++ hw/rtl8139.c| 82 +++--- 4 files changed, 147 insertions(+), 49 deletions(-) diff --git a/Makefile.objs b/Makefile.objs index 281f7a6..54895f8 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -155,6 +155,8 @@ hw-obj-y += msix.o hw-obj-y += ne2000.o hw-obj-y += eepro100.o hw-obj-y += pcnet.o +hw-obj-y += rtl8139.o +hw-obj-y += e1000.o hw-obj-$(CONFIG_SMC91C111) += smc91c111.o hw-obj-$(CONFIG_LAN9118) += lan9118.o diff --git a/Makefile.target b/Makefile.target index eb4d010..1a86fc4 100644 --- a/Makefile.target +++ b/Makefile.target @@ -176,10 +176,6 @@ QEMU_CFLAGS += $(VNC_SASL_CFLAGS) # xen backend driver support obj-$(CONFIG_XEN) += xen_machine_pv.o xen_domainbuild.o -# PCI network cards -obj-y += rtl8139.o -obj-y += e1000.o - # Hardware support obj-i386-y = ide/core.o obj-i386-y += pckbd.o dma.o diff --git a/hw/e1000.c b/hw/e1000.c index fd3059a..0f72db8 100644 --- a/hw/e1000.c +++ b/hw/e1000.c @@ -121,6 +121,7 @@ typedef struct E1000State_st { uint16_t reading; uint32_t old_eecd; } eecd_state; +uint32_t be; } E1000State; #define defreg(x) x = (E1000_##x2) @@ -825,14 +826,11 @@ static void (*macreg_writeops[])(E1000State *, int, uint32_t) = { enum { NWRITEOPS = ARRAY_SIZE(macreg_writeops) }; static void -e1000_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val) +e1000_mmio_writel_le(void *opaque, target_phys_addr_t addr, uint32_t val) { E1000State *s = opaque; unsigned int index = (addr 0x1) 2; -#ifdef TARGET_WORDS_BIGENDIAN -val = bswap32(val); -#endif if (index NWRITEOPS macreg_writeops[index]) macreg_writeops[index](s, index, val); else if (index NREADOPS macreg_readops[index]) @@ -841,25 +839,47 @@ e1000_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val) DBGOUT(UNKNOWN, MMIO unknown write addr=0x%08x,val=0x%08x\n, index2, val); } +static void +e1000_mmio_writel_be(void *opaque, target_phys_addr_t addr, uint32_t val) +{ +val = bswap32(val); +e1000_mmio_writel_le(opaque, addr, val); +} static void -e1000_mmio_writew(void *opaque, target_phys_addr_t addr, uint32_t val) +e1000_mmio_writew_le(void *opaque, target_phys_addr_t addr, uint32_t val) { // emulate hw without byte enables: no RMW -e1000_mmio_writel(opaque, addr ~3, - (val 0x) (8*(addr 3))); +e1000_mmio_writel_le(opaque, addr ~3, + (val 0x) (8*(addr 3))); } static void -e1000_mmio_writeb(void *opaque, target_phys_addr_t addr, uint32_t val) +e1000_mmio_writew_be(void *opaque, target_phys_addr_t addr, uint32_t val) { // emulate hw without byte enables: no RMW -e1000_mmio_writel(opaque, addr ~3, - (val 0xff) (8*(addr 3))); +e1000_mmio_writel_be(opaque, addr ~3, + (val 0x) (8*(addr 3))); +} + +static void +e1000_mmio_writeb_be(void *opaque, target_phys_addr_t addr, uint32_t val) +{ +// emulate hw without byte enables: no RMW +
[Qemu-devel] Re: Compile files only once: some planning
On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote: On Wed, Mar 24, 2010 at 10:05:10PM +0200, Blue Swirl wrote: On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote: rtl8139.c, e1000.c: need to convert ldl/stl to cpu_physical_memory_read/write. I don't see how it would help. These still get target_phys_addr_t which is per-target. Further, a ton of devices do cpu_register_physical_memory/qemu_register_coalesced_mmio. These are also per target. I don't know what I was eating yesterday: there are no references to ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple for the device itself, just add a property be. The attached patch performs this part. But now there is a bigger problem, how to pass the property to the device. It's not fair to require the user to remember to set it. I still don't fully understand how come e1000 cares about target endianness. It shouldn't. Maybe the real fix is to remove the byte swapping. A simple solution would be to change all of cpu_XX functions to get a 64 bit address. This is a lot of churn, if we do this anyway we should also pass length to callbacks, this way rwhandler will get very small or go away completely. It's not too much effort to keep the target_phys_addr_t type. I don't understand - target_phys_addr_t is different for different targets to we will need to recompile the code for each target. What am I missing? target_phys_addr_t is 64 bit on a 64 bit host, on a 32 bit host it's size will be either 64 or 32 bits depending on the target. So the files are compiled once on 64 bit host, twice on 32 bit host if both 32 and 64 bits targets are selected. From e0ab5cc41c68207be558ccb330f4fb83fba4ee6f Mon Sep 17 00:00:00 2001 From: Blue Swirl blauwir...@gmail.com Date: Wed, 24 Mar 2010 19:54:05 + Subject: [PATCH] Compile rtl8139 and e1000 only once WIP Signed-off-by: Blue Swirl blauwir...@gmail.com --- Makefile.objs |2 + Makefile.target |4 -- hw/e1000.c | 108 ++ hw/rtl8139.c| 82 +++--- 4 files changed, 147 insertions(+), 49 deletions(-) diff --git a/Makefile.objs b/Makefile.objs index 281f7a6..54895f8 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -155,6 +155,8 @@ hw-obj-y += msix.o hw-obj-y += ne2000.o hw-obj-y += eepro100.o hw-obj-y += pcnet.o +hw-obj-y += rtl8139.o +hw-obj-y += e1000.o hw-obj-$(CONFIG_SMC91C111) += smc91c111.o hw-obj-$(CONFIG_LAN9118) += lan9118.o diff --git a/Makefile.target b/Makefile.target index eb4d010..1a86fc4 100644 --- a/Makefile.target +++ b/Makefile.target @@ -176,10 +176,6 @@ QEMU_CFLAGS += $(VNC_SASL_CFLAGS) # xen backend driver support obj-$(CONFIG_XEN) += xen_machine_pv.o xen_domainbuild.o -# PCI network cards -obj-y += rtl8139.o -obj-y += e1000.o - # Hardware support obj-i386-y = ide/core.o obj-i386-y += pckbd.o dma.o diff --git a/hw/e1000.c b/hw/e1000.c index fd3059a..0f72db8 100644 --- a/hw/e1000.c +++ b/hw/e1000.c @@ -121,6 +121,7 @@ typedef struct E1000State_st { uint16_t reading; uint32_t old_eecd; } eecd_state; +uint32_t be; } E1000State; #define defreg(x) x = (E1000_##x2) @@ -825,14 +826,11 @@ static void (*macreg_writeops[])(E1000State *, int, uint32_t) = { enum { NWRITEOPS = ARRAY_SIZE(macreg_writeops) }; static void -e1000_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val) +e1000_mmio_writel_le(void *opaque, target_phys_addr_t addr, uint32_t val) { E1000State *s = opaque; unsigned int index = (addr 0x1) 2; -#ifdef TARGET_WORDS_BIGENDIAN -val = bswap32(val); -#endif if (index NWRITEOPS macreg_writeops[index]) macreg_writeops[index](s, index, val); else if (index NREADOPS macreg_readops[index]) @@ -841,25 +839,47 @@ e1000_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val) DBGOUT(UNKNOWN, MMIO unknown write addr=0x%08x,val=0x%08x\n, index2, val); } +static void +e1000_mmio_writel_be(void *opaque, target_phys_addr_t addr, uint32_t val) +{ +val = bswap32(val); +e1000_mmio_writel_le(opaque, addr, val); +} static void -e1000_mmio_writew(void *opaque, target_phys_addr_t addr, uint32_t val) +e1000_mmio_writew_le(void *opaque, target_phys_addr_t addr, uint32_t val) { // emulate hw without byte enables: no RMW -e1000_mmio_writel(opaque, addr ~3, - (val 0x) (8*(addr 3))); +e1000_mmio_writel_le(opaque, addr ~3, + (val 0x) (8*(addr
[Qemu-devel] Re: Compile files only once: some planning
On Wed, Mar 24, 2010 at 10:24:12PM +0200, Blue Swirl wrote: On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote: On Wed, Mar 24, 2010 at 10:05:10PM +0200, Blue Swirl wrote: On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote: rtl8139.c, e1000.c: need to convert ldl/stl to cpu_physical_memory_read/write. I don't see how it would help. These still get target_phys_addr_t which is per-target. Further, a ton of devices do cpu_register_physical_memory/qemu_register_coalesced_mmio. These are also per target. I don't know what I was eating yesterday: there are no references to ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple for the device itself, just add a property be. The attached patch performs this part. But now there is a bigger problem, how to pass the property to the device. It's not fair to require the user to remember to set it. I still don't fully understand how come e1000 cares about target endianness. It shouldn't. Maybe the real fix is to remove the byte swapping. Presumably it's there for a reason? A simple solution would be to change all of cpu_XX functions to get a 64 bit address. This is a lot of churn, if we do this anyway we should also pass length to callbacks, this way rwhandler will get very small or go away completely. It's not too much effort to keep the target_phys_addr_t type. I don't understand - target_phys_addr_t is different for different targets to we will need to recompile the code for each target. What am I missing? target_phys_addr_t is 64 bit on a 64 bit host, on a 32 bit host it's size will be either 64 or 32 bits depending on the target. So the files are compiled once on 64 bit host, twice on 32 bit host if both 32 and 64 bits targets are selected. How about just making it 64 bit unconditionally? How much do we save by using a 32 bit address value? -- MST
Re: [Qemu-devel] Re: Compile files only once: some planning
On 03/24/2010 03:24 PM, Blue Swirl wrote: On 3/24/10, Michael S. Tsirkinm...@redhat.com wrote: On Wed, Mar 24, 2010 at 10:05:10PM +0200, Blue Swirl wrote: On 3/24/10, Michael S. Tsirkinm...@redhat.com wrote: On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote: rtl8139.c, e1000.c: need to convert ldl/stl to cpu_physical_memory_read/write. I don't see how it would help. These still get target_phys_addr_t which is per-target. Further, a ton of devices do cpu_register_physical_memory/qemu_register_coalesced_mmio. These are also per target. I don't know what I was eating yesterday: there are no references to ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple for the device itself, just add a property be. The attached patch performs this part. But now there is a bigger problem, how to pass the property to the device. It's not fair to require the user to remember to set it. I still don't fully understand how come e1000 cares about target endianness. It shouldn't. Maybe the real fix is to remove the byte swapping. My previous pci memory functions patches removed the byte swapping. The problem is that PCI devices are going to operate in little endian mode (usually) whereas the CPU may be acting in big endian mode. We need to do a byte swap somewhere but the better place to do it is in the PCI bus layer. Regards, Anthony Liguori
[Qemu-devel] Re: Compile files only once: some planning
On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote: On Wed, Mar 24, 2010 at 10:24:12PM +0200, Blue Swirl wrote: On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote: On Wed, Mar 24, 2010 at 10:05:10PM +0200, Blue Swirl wrote: On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote: rtl8139.c, e1000.c: need to convert ldl/stl to cpu_physical_memory_read/write. I don't see how it would help. These still get target_phys_addr_t which is per-target. Further, a ton of devices do cpu_register_physical_memory/qemu_register_coalesced_mmio. These are also per target. I don't know what I was eating yesterday: there are no references to ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple for the device itself, just add a property be. The attached patch performs this part. But now there is a bigger problem, how to pass the property to the device. It's not fair to require the user to remember to set it. I still don't fully understand how come e1000 cares about target endianness. It shouldn't. Maybe the real fix is to remove the byte swapping. Presumably it's there for a reason? A simple solution would be to change all of cpu_XX functions to get a 64 bit address. This is a lot of churn, if we do this anyway we should also pass length to callbacks, this way rwhandler will get very small or go away completely. It's not too much effort to keep the target_phys_addr_t type. I don't understand - target_phys_addr_t is different for different targets to we will need to recompile the code for each target. What am I missing? target_phys_addr_t is 64 bit on a 64 bit host, on a 32 bit host it's size will be either 64 or 32 bits depending on the target. So the files are compiled once on 64 bit host, twice on 32 bit host if both 32 and 64 bits targets are selected. How about just making it 64 bit unconditionally? How much do we save by using a 32 bit address value? On a 32 bit host, probably a lot because of register pressure. And it's not too much effort to keep the target_phys_addr_t type logic.
[Qemu-devel] Re: [PATCH 10/15] virtio-serial: Add QMP events for failed port/device add
On Wed, 24 Mar 2010 20:19:28 +0530 Amit Shah amit.s...@redhat.com wrote: When adding a port or a device to the guest fails, management software might be interested in knowing and then cleaning up the host-side of the port. Introduce QMP events to signal such errors. I'm completely unfamiliar with virtio-serial, so let me ask: how are ports added? I'd expect the command performing this operation to fail in this case.
Re: [Qemu-devel] [PATCH] qemu: jaso-parser: Output the content of invalid keyword
On Wed, 24 Mar 2010 17:00:14 +0100 Markus Armbruster arm...@redhat.com wrote: Amos Kong ak...@redhat.com writes: When input some invialid word 'unknowcmd' through QMP port, qemu outputs this error message: parse error: invalid keyword `%s' This patch makes qemu output the content of invalid keyword, like: parse error: invalid keyword `unknowcmd' Signed-off-by: Amos Kong ak...@redhat.com Looks good to me. Hint: it's best to put a version in the subject when you respin, like [PATCH v2] ... Yes, and maintainers may miss a patch down a thread (and it's a good opportunity to fix the subject).
[Qemu-devel] x86_64: iret in long mode resets %fs and %gs base (doesn't on real CPUs)
Hi, I've been investigating why some of my code failed on qemu, but succeeded in bochs and on real hardware. In particular, it turns out that qemu would reset the FS/GS_BASE_MSR whenever I did iret from ring 0 to 3. I traced it down to this bit of code (in target-i386/op_helper.c): static inline void validate_seg(int seg_reg, int cpl) { int dpl; uint32_t e2; /* XXX: on x86_64, we do not want to nullify FS and GS because they may still contain a valid base. I would be interested to know how a real x86_64 CPU behaves */ if ((seg_reg == R_FS || seg_reg == R_GS) (env-segs[seg_reg].selector 0xfffc) == 0) return; So the reason why this didn't work in qemu for me was that I was loading the selector as 8 -- which fails the above test and validate_seg() proceeds to clear the segment base value. Changing my own code to only load 0 into %gs from the start fixed the problem for me. However, qemu is clearly doing something differently from the real hardware. I tested both versions (loading 0 or 8 into %gs) on my Intel P4, and GS_BASE_MSR is preserved in both cases. Perhaps the condition on the selector value should be removed? (Or perhaps the calls to validate_seg() for R_FS/R_GS should be removed from helper_ret_protected()?) Just a heads up. Vegard
[Qemu-devel] [PATCH] Compile ide/core only once
Make win2k install hack unconditional as it is still restricted to x86 only in vl.c. Replace TARGET_PAGE_SIZE with 4096 because that figure is already used later. Signed-off-by: Blue Swirl blauwir...@gmail.com --- Makefile.objs|1 + Makefile.target | 11 --- default-configs/arm-softmmu.mak |1 + default-configs/i386-softmmu.mak |1 + default-configs/mips-softmmu.mak |1 + default-configs/mips64-softmmu.mak |1 + default-configs/mips64el-softmmu.mak |1 + default-configs/mipsel-softmmu.mak |1 + default-configs/ppc-softmmu.mak |1 + default-configs/ppc64-softmmu.mak|1 + default-configs/ppcemb-softmmu.mak |1 + default-configs/sh4-softmmu.mak |1 + default-configs/sh4eb-softmmu.mak|1 + default-configs/sparc64-softmmu.mak |1 + default-configs/x86_64-softmmu.mak |1 + hw/ide/core.c| 10 +++--- vl.c |2 +- 17 files changed, 22 insertions(+), 15 deletions(-) diff --git a/Makefile.objs b/Makefile.objs index 281f7a6..fe81f6c 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -161,6 +161,7 @@ hw-obj-$(CONFIG_LAN9118) += lan9118.o hw-obj-$(CONFIG_NE2000_ISA) += ne2000-isa.o # IDE +hw-obj-$(CONFIG_IDE_CORE) += ide/core.o hw-obj-$(CONFIG_IDE_QDEV) += ide/qdev.o hw-obj-$(CONFIG_IDE_PCI) += ide/pci.o hw-obj-$(CONFIG_IDE_ISA) += ide/isa.o diff --git a/Makefile.target b/Makefile.target index eb4d010..a17de90 100644 --- a/Makefile.target +++ b/Makefile.target @@ -181,8 +181,7 @@ obj-y += rtl8139.o obj-y += e1000.o # Hardware support -obj-i386-y = ide/core.o -obj-i386-y += pckbd.o dma.o +obj-i386-y = pckbd.o dma.o obj-i386-y += vga.o obj-i386-y += mc146818rtc.o i8259.o pc.o obj-i386-y += cirrus_vga.o apic.o ioapic.o acpi.o piix_pci.o @@ -191,7 +190,7 @@ obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o obj-i386-y += debugcon.o multiboot.o # shared objects -obj-ppc-y = ppc.o ide/core.o ide/macio.o +obj-ppc-y = ppc.o ide/macio.o obj-ppc-y += vga.o dma.o openpic.o # PREP target obj-ppc-y += pckbd.o i8259.o mc146818rtc.o @@ -215,7 +214,6 @@ obj-mips-y += mips_addr.o mips_timer.o mips_int.o obj-mips-y += dma.o vga.o i8259.o rc4030.o obj-mips-y += vga-isa-mm.o obj-mips-y += g364fb.o jazz_led.o dp8393x.o -obj-mips-y += ide/core.o obj-mips-y += gt64xxx.o pckbd.o mc146818rtc.o acpi.o ds1225y.o obj-mips-y += piix4.o cirrus_vga.o obj-mips-y += mipsnet.o @@ -248,7 +246,6 @@ obj-cris-y += pflash_cfi02.o ifeq ($(TARGET_ARCH), sparc64) obj-sparc-y = sun4u.o pckbd.o apb_pci.o -obj-sparc-y += ide/core.o obj-sparc-y += vga.o obj-sparc-y += mc146818rtc.o obj-sparc-y += cirrus_vga.o @@ -268,7 +265,7 @@ obj-arm-y += arm-semi.o obj-arm-y += pxa2xx.o pxa2xx_pic.o pxa2xx_gpio.o pxa2xx_timer.o pxa2xx_dma.o obj-arm-y += pxa2xx_lcd.o pxa2xx_mmci.o pxa2xx_pcmcia.o pxa2xx_keypad.o obj-arm-y += pflash_cfi01.o gumstix.o -obj-arm-y += zaurus.o ide/core.o ide/microdrive.o spitz.o tosa.o tc6393xb.o +obj-arm-y += zaurus.o ide/microdrive.o spitz.o tosa.o tc6393xb.o obj-arm-y += omap1.o omap_lcdc.o omap_dma.o omap_clk.o omap_mmc.o omap_i2c.o obj-arm-y += omap2.o omap_dss.o soc_dma.o obj-arm-y += omap_sx1.o palm.o tsc210x.o @@ -282,7 +279,7 @@ obj-arm-y += syborg_virtio.o obj-sh4-y = shix.o r2d.o sh7750.o sh7750_regnames.o tc58128.o obj-sh4-y += sh_timer.o sh_serial.o sh_intc.o sh_pci.o sm501.o -obj-sh4-y += ide/core.o ide/mmio.o +obj-sh4-y += ide/mmio.o obj-m68k-y = an5206.o mcf5206.o mcf_uart.o mcf_intc.o mcf5208.o mcf_fec.o obj-m68k-y += m68k-semi.o dummy_m68k.o diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak index 02ad192..ea878a4 100644 --- a/default-configs/arm-softmmu.mak +++ b/default-configs/arm-softmmu.mak @@ -8,6 +8,7 @@ CONFIG_ECC=y CONFIG_SERIAL=y CONFIG_PTIMER=y CONFIG_SD=y +CONFIG_IDE_CORE=y CONFIG_MAX7310=y CONFIG_WM8750=y CONFIG_TWL92230=y diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak index 4dbf656..59eb670 100644 --- a/default-configs/i386-softmmu.mak +++ b/default-configs/i386-softmmu.mak @@ -9,6 +9,7 @@ CONFIG_I8254=y CONFIG_PCSPK=y CONFIG_USB_UHCI=y CONFIG_FDC=y +CONFIG_IDE_CORE=y CONFIG_IDE_QDEV=y CONFIG_IDE_PCI=y CONFIG_IDE_ISA=y diff --git a/default-configs/mips-softmmu.mak b/default-configs/mips-softmmu.mak index 345a093..cb48ed1 100644 --- a/default-configs/mips-softmmu.mak +++ b/default-configs/mips-softmmu.mak @@ -10,6 +10,7 @@ CONFIG_I8254=y CONFIG_PCSPK=y CONFIG_USB_UHCI=y CONFIG_FDC=y +CONFIG_IDE_CORE=y CONFIG_IDE_QDEV=y CONFIG_IDE_PCI=y CONFIG_IDE_ISA=y diff --git a/default-configs/mips64-softmmu.mak b/default-configs/mips64-softmmu.mak index 5900ee6..585d6bb 100644 --- a/default-configs/mips64-softmmu.mak +++ b/default-configs/mips64-softmmu.mak @@ -10,6 +10,7 @@ CONFIG_I8254=y CONFIG_PCSPK=y CONFIG_USB_UHCI=y CONFIG_FDC=y +CONFIG_IDE_CORE=y CONFIG_IDE_QDEV=y
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
On 24.03.2010, at 21:32, Anthony Liguori wrote: On 03/24/2010 03:12 PM, Luiz Capitulino wrote: On Wed, 24 Mar 2010 21:49:45 +0200 Avi Kivitya...@redhat.com wrote: On 03/24/2010 06:42 PM, Luiz Capitulino wrote: On Wed, 24 Mar 2010 12:42:16 +0200 Avi Kivitya...@redhat.com wrote: So, at best qemud is a toy for people who are annoyed by libvirt. Is the reason for doing this in qemu because libvirt is annoying? Mostly. I don't see how adding yet another layer/daemon is going to improve ours and user's life (the same applies for libqemu). libvirt becomes optional. I think it should only be optional if all you want is to run a single VM in this case what seems to be missing on our side is a _real_ GUI, bundled with QEMU potentially written in a high-level language. That's a separate problem. Then we make virt-manager optional and this is good because we can sync features way faster and we don't have to care about _managing_ several VMs, our world in terms of usability and maintainability is about one VM. IMVHO, everything else should be done by third-party tools like libvirt, we just provide the means for it. We need to have a common management interface for third party tools. libvirt cannot be that today because of the fact that it doesn't support all of our features. What we need to figure out is how we can work with the libvirt team to fix this. The feature problem isn't the only one. It's also about ease of use. I personally find the qemu command line easier to use than anything libvirt-derived. So far, a libqemu.so with a flexible transport that could be used directly by a libvirt user (ala cairo/gdk type interactions) seems like the best solution to me. ACK. One thing I was thinking is that it might be a good idea to split the qemu command line version off as well. The qemu backend would then only speak QMP with an empty device case and the actual qemu command would run that, connect to it via QMP and instanciate everything. That way we'd get a consistent interface for management apps while keeping an easy to use CLI. Alex
Re: [Qemu-devel] [PATCH 2/2] machine opts framework
On Wed, Mar 24, 2010 at 02:43:35PM -0500, Anthony Liguori wrote: On 03/24/2010 02:26 PM, Glauber Costa wrote: This patch adds initial support for the -machine option, that allows command line specification of machine attributes (always relying on safe defaults). Besides its value per-se, it is the saner way we found to allow for enabling/disabling of kvm's in-kernel irqchip. A machine with in-kernel-irqchip could be specified as: -machine irqchip=apic-kvm And one without it: -machine irqchip=apic To demonstrate how it'd work, this patch introduces a choice between pic and apic, pic being the old-style isa thing. I started from a different place. See machine-qemuopts in my staging tree. I think we should combine efforts. Regards, Absolutely. I see little overlap between what we did. Just a comment on yours: -static void an5206_init(ram_addr_t ram_size, - const char *boot_device, - const char *kernel_filename, const char *kernel_cmdline, - const char *initrd_filename, const char *cpu_model) +static void an5206_init(QemuOpts *opts) { Since we're changing init functions anyway, I believe we should also pass a pointer to the machine structure. With that, we can avoing using the global current_machine.
Re: [Qemu-devel] Re: Compile files only once: some planning
On Wed, Mar 24, 2010 at 10:24:12PM +0200, Blue Swirl wrote: On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote: On Wed, Mar 24, 2010 at 10:05:10PM +0200, Blue Swirl wrote: On 3/24/10, Michael S. Tsirkin m...@redhat.com wrote: On Tue, Mar 23, 2010 at 11:43:51PM +0200, Blue Swirl wrote: rtl8139.c, e1000.c: need to convert ldl/stl to cpu_physical_memory_read/write. I don't see how it would help. These still get target_phys_addr_t which is per-target. Further, a ton of devices do cpu_register_physical_memory/qemu_register_coalesced_mmio. These are also per target. I don't know what I was eating yesterday: there are no references to ldl/stl in either rtl8139 or e1000. In fact, the conversion is simple for the device itself, just add a property be. The attached patch performs this part. But now there is a bigger problem, how to pass the property to the device. It's not fair to require the user to remember to set it. I still don't fully understand how come e1000 cares about target endianness. It shouldn't. Maybe the real fix is to remove the byte swapping. The real fix is actually to add a layer handling bus byte swapping depending on how bus are connected. Currently it only works because all big endian boards QEMU emulates need to byteswap bus access, and none of the little endian boards need to do that. -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net
Re: [Qemu-devel] [PATCH 2/2] machine opts framework
On 03/24/2010 03:58 PM, Glauber Costa wrote: On Wed, Mar 24, 2010 at 02:43:35PM -0500, Anthony Liguori wrote: On 03/24/2010 02:26 PM, Glauber Costa wrote: This patch adds initial support for the -machine option, that allows command line specification of machine attributes (always relying on safe defaults). Besides its value per-se, it is the saner way we found to allow for enabling/disabling of kvm's in-kernel irqchip. A machine with in-kernel-irqchip could be specified as: -machine irqchip=apic-kvm And one without it: -machine irqchip=apic To demonstrate how it'd work, this patch introduces a choice between pic and apic, pic being the old-style isa thing. I started from a different place. See machine-qemuopts in my staging tree. I think we should combine efforts. Regards, Absolutely. I see little overlap between what we did. Just a comment on yours: -static void an5206_init(ram_addr_t ram_size, - const char *boot_device, - const char *kernel_filename, const char *kernel_cmdline, - const char *initrd_filename, const char *cpu_model) +static void an5206_init(QemuOpts *opts) { Since we're changing init functions anyway, I believe we should also pass a pointer to the machine structure. With that, we can avoing using the global current_machine. Yes, I had the same thought. For instance, with isa-pc is just pc_init with an extra parameter. If we had a structure like: typedef struct QEMUPCMachine { QEMUMachine parent; int pci_enabled; } QEMUPCMachine; Then you wouldn't need those dispatch functions. Not a huge win for x86 but for sparc and some arm boards, it's pretty significant. Regards, Anthony Liguori
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
On Wed, 24 Mar 2010 21:54:09 +0100 Alexander Graf ag...@suse.de wrote: On 24.03.2010, at 21:32, Anthony Liguori wrote: On 03/24/2010 03:12 PM, Luiz Capitulino wrote: On Wed, 24 Mar 2010 21:49:45 +0200 Avi Kivitya...@redhat.com wrote: On 03/24/2010 06:42 PM, Luiz Capitulino wrote: On Wed, 24 Mar 2010 12:42:16 +0200 Avi Kivitya...@redhat.com wrote: So, at best qemud is a toy for people who are annoyed by libvirt. Is the reason for doing this in qemu because libvirt is annoying? Mostly. I don't see how adding yet another layer/daemon is going to improve ours and user's life (the same applies for libqemu). libvirt becomes optional. I think it should only be optional if all you want is to run a single VM in this case what seems to be missing on our side is a _real_ GUI, bundled with QEMU potentially written in a high-level language. That's a separate problem. Then we make virt-manager optional and this is good because we can sync features way faster and we don't have to care about _managing_ several VMs, our world in terms of usability and maintainability is about one VM. IMVHO, everything else should be done by third-party tools like libvirt, we just provide the means for it. We need to have a common management interface for third party tools. libvirt cannot be that today because of the fact that it doesn't support all of our features. What we need to figure out is how we can work with the libvirt team to fix this. The feature problem isn't the only one. It's also about ease of use. I personally find the qemu command line easier to use than anything libvirt-derived. Because your a developer and it does make sense to have a good CLI, on the other hand we also have use cases for a GUI bundled in QEMU and libvirt-derived things, which know how to deal with several VMs and integrates well with lots of other things.
Re: [Qemu-devel] Re: [libvirt] Supporting hypervisor specific APIs in libvirt
On 03/24/2010 04:25 PM, Luiz Capitulino wrote: I see it as a related problem, because what seems to be under discussion is the quality of our interfaces with humans and tools. Also, when we were discussing the usuability problems I remember that you *WARNING: I might be wrong here, please correct me if so* you said that you don't push users to libvirt because it's out of sync with our features. Yes. The point is that, even if this true and even if we solve that, I don't think it will solve the problem of a good experience for a 'single VM user', because libvirt is more than that and people will likely be annoyed as much as they are today. I believe this problem is up to us to solve. With my qemu hat on, I'm happy to ignore libvirt and say we need to own our interfaces and to compete with libvirt for users. But with my Linux virtualization hat on, I want to see a single management interface that users can use without having to make a choice between libvirt features or libqemu features. Then we make virt-manager optional and this is good because we can sync features way faster and we don't have to care about _managing_ several VMs, our world in terms of usability and maintainability is about one VM. IMVHO, everything else should be done by third-party tools like libvirt, we just provide the means for it. We need to have a common management interface for third party tools. QMP? :-) Only if QMP is compatible with libvirt. I don't want a user to have to choose between QMP and libvirt. So far, a libqemu.so with a flexible transport that could be used directly by a libvirt user (ala cairo/gdk type interactions) seems like the best solution to me. I tend to disagree. First, I think we should invest our time and effort on the text protocol business, which is QMP. Having yet another public interface will likely split efforts a bit and will make clients' life harder (which one should I choose? What if they get out of sync?). Not to mention that I think Paul has a point, if QMP is not useful here, why do we have it in the first place (vs. a C library from the beginning)? You mentioned dynamic dispatch, but this is useful only for C clients right? If so, what C clients you expected beyond libvirt? Users want a C API. I don't agree that libvirt is the only C interface consumer out there. Note that libvirt has added a new events API recently. The second most important point for me is: why do you believe that libqemu.so is going to improve things? Do you expect that libvirt will sync faster? With GDK and Cairo, when Cairo adds a new feature, GDK doesn't have to do anything to support it. Users just get a cairo context from GDK and use the cairo API directly. GDK provides a higher level interface for 2d operations that is more platform agnostic, and users can choice to use that or write directly to the cairo API. If this is the case, I think it will be as slower as it's currently, as the problem is not the availability of interfaces, but most likely community integration. I like the idea of having a transient qemu-specific API in libvirt, as suggested by someone in this thread. I really think what we want is for a libvirt user to be able to call libqemu functions directly. There shouldn't have to be libvirt specific functions for every operation we expose. Regards, Anthony Liguori
Re: [Qemu-devel] TBL register permissions for PPC
Hello All, Please review patch for TBL SPR read access for generic PPC. *Description:* POWER specification docs define TBL/TBU SPRs as readable in user and privileged modes. Therefore SPRs permissions were changed in gen_tbl function in target-ppc/translate_init.c file. *Testing:* Tested with vxworks-6.2 bsp and OS on custom qemu board that includes ppc405 emulated core BR, Dmitry Ilyevsky On Wed, Dec 2, 2009 at 2:23 AM, Alexander Graf ag...@suse.de wrote: On 01.12.2009, at 19:33, Dima Ilyevsky wrote: Hello All, I have a question about read permissions of TBL SPR for all ppc processors: I have discovered that my application, compiled by WindRiver diab compiler and running in vxworks OS on ppc405 architecture bumps into exception generated when trying to read TBL or TBU registers: Unless Linux does something funky, mftlb, mftbu (and mftb on 64 bit) are readable from PR=1. int main() { long tbu=0, tbl=0; asm(mftbu %0 : =r (tbu)); asm(mftbl %0 : =r (tbl)); printf(TB: %#x %#x\n, tbl, tbu); } ag...@lychee:/tmp ./mftb TB: 0xc0397180 0x603 However it can't be written to: asm(mttbl %0 : : r (tbl)); ag...@lychee:/tmp ./mftb Illegal instruction So yes, I'd suspect a bug in qemu here. Feel free to send a patch. Alex From 141bf29f5355f163205c57e98590730ed15bfb86 Mon Sep 17 00:00:00 2001 From: n/a inst...@ubuntu-desktop.(none) Date: Thu, 25 Mar 2010 00:22:25 +0300 Subject: [PATCH] Generic PowerPC time base SPR should be accessible in user/priv modes for reading --- target-ppc/translate_init.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c index db4dc17..e8eadf4 100644 --- a/target-ppc/translate_init.c +++ b/target-ppc/translate_init.c @@ -777,16 +777,16 @@ static void gen_tbl (CPUPPCState *env) spr_read_tbl, SPR_NOACCESS, 0x); spr_register(env, SPR_TBL, TBL, - SPR_NOACCESS, SPR_NOACCESS, - SPR_NOACCESS, spr_write_tbl, + spr_read_tbl, SPR_NOACCESS, + spr_read_tbl, spr_write_tbl, 0x); spr_register(env, SPR_VTBU, TBU, spr_read_tbu, SPR_NOACCESS, spr_read_tbu, SPR_NOACCESS, 0x); spr_register(env, SPR_TBU, TBU, - SPR_NOACCESS, SPR_NOACCESS, - SPR_NOACCESS, spr_write_tbu, + spr_read_tbu, SPR_NOACCESS, + spr_read_tbu, spr_write_tbu, 0x); } -- 1.7.0
[Qemu-devel] Re: [PATCH v2] update bochs vbe interface
Gerd Hoffmann kra...@redhat.com wrote: The bochs vbe interface got a new register a while back, which specifies the linear framebuffer size in 64k units. This patch adds support for the new register to qemu. With this patch applied vgabios 0.6c works with qemu. [ v2: Don't savevm the new register. Doing so breaks migration, and as it carries read-only information for the guest there is no need to save it. ] It don't compile (as expected). VMSTATE_UINT16_ARRAY() checks that you sent the whole array. /mnt/kvm/qemu/qemu-negotiate/hw/vga.c:2219: error: invalid operands to binary - (have ‘uint16_t (*)[10]’ and ‘uint16_t (*)[11]’) make[1]: *** [vga.o] Error 1 make[1]: *** Waiting for unfinished jobs ^Cmake[1]: *** [translate.o] Interrupt make[1]: *** [op_helper.o] Interrupt make: *** [subdir-x86_64-softmmu] Interrupt --- hw/vga.c |5 +++-- hw/vga_int.h |6 +- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/hw/vga.c b/hw/vga.c index 6a1a059..f9e07cf 100644 --- a/hw/vga.c +++ b/hw/vga.c @@ -1955,7 +1955,8 @@ void vga_common_reset(VGACommonState *s) #ifdef CONFIG_BOCHS_VBE s-vbe_index = 0; memset(s-vbe_regs, '\0', sizeof(s-vbe_regs)); -s-vbe_regs[VBE_DISPI_INDEX_ID] = VBE_DISPI_ID0; +s-vbe_regs[VBE_DISPI_INDEX_ID] = VBE_DISPI_ID5; Now, to show my ignorance, what does this change means? I can't understand it looking at the whole file (but I don't understand vga.c too well anyways). Later, Juan.
[Qemu-devel] Re: [PATCH] update bochs vbe interface
Gerd Hoffmann kra...@redhat.com wrote: On 03/24/10 18:04, Juan Quintela wrote: Gerd Hoffmannkra...@redhat.com wrote: The bochs vbe interface got a new register a while back, which specifies the linear framebuffer size in 64k units. This patch adds support for the new register to qemu. With this patch applied vgabios 0.6c works with qemu. Signed-off-by: Gerd Hoffmannkra...@redhat.com It breaks migration. vga.c:2218:VMSTATE_UINT16_ARRAY(vbe_regs, VGACommonState, VBE_DISPI_INDEX_NB), 2218 is the interesting line. Can we freeze this patch until the subsections stuff is done? Yea, I see. Well, the new register doesn't carry any state, it is read-only information for the guest. So the easy way out is to simply not save it as we don't have to. Thinking a bit more about it. Humm, I think it means. Can you migrate from a new vga to an old one, and maintain it working? -s-vbe_regs[VBE_DISPI_INDEX_ID] = VBE_DISPI_ID0; +s-vbe_regs[VBE_DISPI_INDEX_ID] = VBE_DISPI_ID5; +s-vbe_regs[VBE_DISPI_INDEX_VIDEO_MEMORY_64K] = s-vram_size / (64 * 1024); After migration, vbe_regs[VBE_DISPI_INDEX_ID] would have the value VBE_DISPI_ID5, but vbe_regs[VBE_DISPI_INDEX_VIDEO_MEMORY_64K] will have any random value, no? cheers, Gerd
Re: [Qemu-devel] Re: Compile files only once: some planning
But now there is a bigger problem, how to pass the property to the device. It's not fair to require the user to remember to set it. It should not be a property of the device. All devices have a native endianness (for PCI this is little-endian), and the intermediate busses/interconnects should determine whether byteswapping occurs. Paul
[Qemu-devel] Significant performance regression in qemu-system-mips.
I have a native build under qemu that gets killed if it doesn't produce a line of output for 60 seconds (hang detection enforced by the host monitoring qemu's stdout with --nographic, not from within qemu). In the most recent release version, it never came close to triggering on mips with a 30 second timeout. In the current -git version (well, as of Thursday anyway), it triggers frequently (about 90% of the time) even with a 60 second timeout. I bisected it to this: commit 1828be316f6637d43dd4c4f5f32925b17fb8107f Author: Paolo Bonzini pbonz...@redhat.com Date: Wed Mar 10 11:38:41 2010 +0100 more alarm timer cleanup The timer_alarm_pending variable is related to the alarm timer but not placed in the struct. Also, in qemu_mod_timer the wrong flag was being tested: the timer is rearmed in the alarm timer bottom half, so the right flag to test there is the pending flag. Finally, I hoisted the NULL checks from alarm_has_dynticks to host_alarm_handler. Signed-off-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Anthony Liguori aligu...@us.ibm.com Reverting that patch fixed it (git show HEAD | patch -R -p1), by which I mean three consecutive runs with 30 second timeout didn't trigger the hang detection. Unfortunately, I can't revert that patch in current origin/master because most of the hunks fail... Help? Rob -- Latency is more important than throughput. It's that simple. - Linus Torvalds