Re: [libvirt] NFS over RDMA small block DIRECT_IO bug
On 09/04/2012 03:04 PM, Myklebust, Trond wrote: > On Tue, 2012-09-04 at 11:31 +0200, Andrew Holway wrote: >> Hello. >> >> # Avi Kivity avi(a)redhat recommended I copy kvm in on this. It would also >> seem relevent to libvirt. # >> >> I have a Centos 6.2 server and Centos 6.2 client. >> >> [root@store ~]# cat /etc/exports >> /dev/shm >> 10.149.0.0/16(rw,fsid=1,no_root_squash,insecure)(I have tried with non >> tempfs targets also) >> >> >> [root@node001 ~]# cat /etc/fstab >> store.ibnet:/dev/shm /mnt nfs >> rdma,port=2050,defaults 0 0 >> >> >> I wrote a little for loop one liner that dd'd the centos net install image >> to a file called 'hello' then checksummed that file. Each iteration uses a >> different block size. >> >> Non DIRECT_IO seems to work fine. DIRECT_IO with 512byte, 1K and 2K block >> sizes get corrupted. > > > That is expected behaviour. DIRECT_IO over RDMA needs to be page aligned > so that it can use the more efficient RDMA READ and RDMA WRITE memory > semantics (instead of the SEND/RECEIVE channel semantics). Shouldn't subpage requests fail then? O_DIRECT block requests fail for subsector writes, instead of corrupting your data. Hopefully this is documented somewhere. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [User question] Huge buffer size on KVM host
On 08/16/2012 05:54 PM, Martin Wawro wrote: > > On Aug 15, 2012, at 2:57 PM, Avi Kivity wrote: > >>> >>> We are using logical volumes and the cache is set to 'none'. >> >> Strange, that should work without any buffering. >> >> What the contents of >> >> /sys/block/sda/queue/hw_sector_size >> >> and >> >> /sys/block/sda/queue/logical_block_size >> >> ? >> > > Hi Avi, > > It seems that the kernel on that particular machine is too old, those entries > are > not featured. We checked on a comparable setup with a newer kernel and both > entries > were set to 512. > > We also did have a third more thorough look on the caching. It turns out that > the > virt-manager does not seem to honor the caching adjusted in the GUI correctly. > We disabled caching on all virtual devices for this particular VM and checking > with "ps -fxal" revealed, that only one of those devices (and a rather small > one too) > had this set. We corrected this in the XML file directly and the buffer size > currently resides at around 1.8 GB after rebooting the VM (the only virtio > device > not having the cache=none option set is now the (non-mounted) cdrom). > cc += libvirt-list Is there a reason that cdroms don't get cache=none? -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] QEMU CPU model versioning/compatibility (was Re: KVM call minutes July 31th)
On 07/31/2012 06:14 PM, Eduardo Habkost wrote: > On Tue, Jul 31, 2012 at 04:32:05PM +0200, Juan Quintela wrote: >> - 1.2 plans for CPU model versioning/compatibility (eduardo) >> (global properties vs QOM vs qdev) >> how to do it ? configuration file? moving back to the code? >> different external interface from internal one > > (CCing libvir-list) > > So, the problem is (please correct me if I am wrong): > > - libvirt expects the CPU models from the current config file to be > provided by QEMU. libvirt won't write a cpudef config file. > - Fixing compatibility problems on the CPU models (even the ones that > are in config files) are expected to be QEMU's responsibility. > > Moving the CPU models inside the code is a step backwards, IMO. I don't > think loading some kind of data from an external file (provided and > maintained by QEMU itself) should be considered something special and > magic, and make the data there a second-class citizen (that QEMU refuses > to give proper support and keep compatibility). I agree. > But if it will make us stop running in circles every time something > related to those files needs to be changed (remember the -no-user-config > discussion?), I think I will accept that. The issue is that we have a lot of machinery for backwards compatibility in the code, and none in cpu definitions parser. Yes we could mark up the cpu definitions so it could be text driven, but that's effort that could be spent elsewhere. Moving it to a data-driven C implementation that can rely on the existing backwards compat code is a good tradeoff IMO. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Constantly changing USB product ID
On 03/28/2012 02:41 PM, Avi Kivity wrote: > On 03/27/2012 05:48 PM, Jaap Winius wrote: > > Hi folks, > > > > Recently I learned how to configure KVM with USB pass-though > > functionality. In my case I configured my guest domain with this block > > of code: > > > > > > > > > > > > > > > > > > > > At first this worked fine, but then later the guest domain refused to > > start because the USB device was absent. When I checked, I found that > > its product ID had mysteriously changed to 1771. Later it was back at > > 1772. Now it appears that the USB device I am dealing with has a > > product ID that changes back and forth between 1771 and 1772 at random. > > > > Apparently, the Windows program running on the guest domain is > > designed to deal with this nonsense, but the question is, Can KVM be > > configured to deal with it? Something like > > would be useful, but that doesn't work. > > > > Any ideas would be much appreciated. > > > > This is really strange. What kind of device is this? > > I've filed an RFE [1] for virt-manager for assigning USB host devices > opportunistically, that is if they're plugged they're assigned, and if > not the guest starts without them. If it were implemented, you could > assign both 0x1771 and 0x1772 and whichever ID the device is today would > get assigned. > > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=804432 > btw, the correct place for this discussion is likely the libvirt mailing list, or maybe the virt-manager list if it exists. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt
On 03/26/2012 09:00 PM, Anthony Liguori wrote: >>> Yes, that's one reason. But maybe a user wants to have a whole >>> different set of machine types and doesn't care to have the ones we >>> provide. Why prevent a user from doing this? >> >> How are we preventing a user from doing it? In what way is -nodefconfig >> helping it? > > > Let me explain it in a different way, perhaps. > > We launch smbd in QEMU in order to do file sharing over slirp. One of > the historic problems we've had is that we don't assume root > privileges, yet want to be able to run smbd without using any of the > system configuration files. > > You can do this by specify -s with the config file, and then in the > config file you can overload the various default paths (like private > dir, lock dir, etc.). In some cases, earlier versions of smbd didn't > allow you to change private dir. > > You should be able to tell a well behaved tool not to read any > configuration/data files and explicitly tell it where/how to read > them. We cannot exhaustively anticipate every future use case of QEMU. 100% agree. But that says nothing about a text file that defines "westmere" as a set of cpu flags, as long as we allow the user to define "mywestmere" as a different set. That is because target-x86_64.cfg does not configure anything, it just defines a macro, which qemu doesn't force you to use. > > But beyond the justification for -nodefconfig, the fact is that it > exists today, and has a specific semantic. If we want to have a > different semantic, we should introduce a new option (-no-user-config). Sure. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt
On 03/26/2012 09:03 PM, Anthony Liguori wrote: > > I think what we want to move toward is a -no-machine option which > allows a user to explicitly build a machine from scratch. That is: > > qemu -no-machine -device i440fx,id=host -device isa-serial,chr=chr0 ... > I'd call it -M bare-1.1, so that it can be used to override driver properties in 1.2+. So we'd have # default machine for this version qemu / qemu -M pc # an older version's pc qemu -M pc-1.1 # just a chassis, bring your own screwdriver qemu -M bare # previous generation chassis, beige qemu -M bare-1.1 That is because -M not only specifies the components that go into the machine, it also alters other devices you add to it. This also helps preserve the planet's dwindling supply of command line options. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt
On 03/26/2012 01:24 PM, Jiri Denemark wrote: > ... > > > The command line becomes unstable if you use -nodefconfig. > > > > -no-user-config solves this but I fully expect libvirt would continue to > > use > > -nodefconfig. > > Libvirt uses -nodefaults -nodefconfig because it wants to fully control how > the virtual machine will look like (mainly in terms of devices). In other > words, we don't want any devices to just magically appear without libvirt > knowing about them. -nodefaults gets rid of default devices that are built > directly in qemu. Since users can set any devices or command line options > (such as enable-kvm) into qemu configuration files in @SYSCONFDIR@, we need to > avoid reading those files as well. Hence we use -nodefconfig. However, we > would still like qemu to read CPU definitions, machine types, etc. once they > become externally loaded configuration (or however we decide to call it). That > said, when CPU definitions are moved into @DATADIR@, and -no-user-config is > introduced, I don't see any reason for libvirt to keep using -nodefconfig. > > I actually like > -no-user-config > more than > -nodefconfig -readconfig @DATADIR@/... > since it would avoid additional magic to detect what files libvirt should > explicitly pass to -readconfig but basically any approach that would allow us > to do read files only from @DATADIR@ is much better than what we have with > -nodefconfig now. That's how I see it as well. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt
On 03/25/2012 08:11 PM, Anthony Liguori wrote: > >> I don't think -nodefconfig (as defined) is usable, since there is no way >> for the user to tell what it means short of reading those files. > > *if the user doesn't know specifics about this QEMU version. > > You make the assumption that all users are going to throw arbitrary > options at arbitrary QEMU versions. That's certainly an important > use-case but it's not the only one. If a Fedora user is using qemu, then their qemu version will change every six months. Their options are to update their scripts/management tool in step, or not have their management tool use -nodefconfig. The same holds for anyone using qemu from upstream, since that's approximately the qemu release cycle. > >> -no-user-config is usable, I think it needs also to mean that qemu >> without -M/-cpu/-m options will error out? > > You're confusing -nodefaults (or something stronger than -nodefaults) > with -no-user-config. > Right. > Yes, the distinctions are confusing. It's not all fixable tomorrow. > If we take my config refactoring series, we can get 90% of the way > there soon but Paolo has a more thorough refactoring.. > "#define westmere blah" is not configuration, otherwise the meaning of configuration will drift over time. -cpu blah is, of course. >>> >>> It's the same mechanism, but the above would create two classes of >>> default configuration files and then it becomes a question of how >>> they're used. >> >> Confused. > > We don't have a formal concept of -read-definition-config and > -read-configuration-config > > There's no easy or obvious way to create such a concept either nor do > I think the distinction is meaningful to users. Definition files should be invisible to users. They're part of the implementation. If we have a file that says pc-1.1 = piix + cirrus + memory(128) + ... then it's nobody's business if it's in a text file or a .c file. Of course it's nice to allow users to load their own definition files, but that's strictly a convenience. >> Exactly. The types are no different, so there's no reason to >> discriminate against types that happen to live in qemu-provided data >> files vs. qemu code. They aren't instantiated, so we lose nothing by >> creating the factories (just so long as the factories aren't >> mass-producing objects). > > > At some point, I'd like to have type modules that are shared objects. > I'd like QEMU to start with almost no builtin types and allow the user > to configure which modules get loaded. > > In the long term, I'd like QEMU to be a small, robust core with the > vast majority of code relegated to modules with the user ultimately in > control of module loading. > > Yes, I'd want some module autoloading system but there should always > be a way to launch QEMU without loading any modules and then load a > very specific set of modules (as defined by the user). > > You can imagine this being useful for something like Common Criteria > certifications. Okay. > It's obviously defined for a given release, just not defined long term. > >> If I see something like -nodefconfig, I assume it will create a bare >> bones guest that will not depend on any qemu defaults and will be stable >> across releases. > > That's not even close to what -nodefconfig is. That's pretty much > what -nodefaults is but -nodefaults has also had a fluid definition > historically. Okay. Let's just make sure to document -nodefconfig as version specific and -nodefaults as the stable way to create a bare bones guest (and define exactly what that means). -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt
On 03/25/2012 08:01 PM, Anthony Liguori wrote: >> I don't think this came out of happiness, but despair. Seriously, >> keeping compatibility is one of the things we work hardest to achieve, >> and we can't manage it for our command line? > > > I hate to burst your bubble, but we struggle and rarely maintain the > level of compatibility you're seeking to have. > > I agree with you that we need to do a better job maintaining > compatibility which is why I'm trying to clearly separate the things > that we will never break from the things that will change over time. > > -nodefconfig is a moving target. If you want stability, don't use > it. If you just want to prevent the user's /etc/qemu stuff from being > loaded, use -no-user-config. Fine, but let's clearly document it as such. Note just saying it doesn't load any configuration files isn't sufficient. We have to say that it kills Westmere and some of its friends, but preserves others like qemu64. Otherwise it's impossible to use it except by trial and error. > >>> >>> I'm not saying that backwards compat isn't important--it is. But >>> there are users who are happy to live on the bleeding edge. >> >> That's fine, but I don't see how -nodefconfig helps them. All it does >> is take away the building blocks (definitions) that they can use when >> setting up their configuration. > > Yes, this is a feature. I don't see how, but okay. > Suppose we define the southbridge via a configuration file. Does that mean we don't load it any more? >>> >>> Yes. If I want the leanest and meanest version of QEMU that will >>> start in the smallest number of milliseconds, then being able to tell >>> QEMU not to load configuration files and create a very specific >>> machine is a Good Thing. Why exclude users from being able to do this? >> >> So is this the point? Reducing startup time? > > Yes, that's one reason. But maybe a user wants to have a whole > different set of machine types and doesn't care to have the ones we > provide. Why prevent a user from doing this? How are we preventing a user from doing it? In what way is -nodefconfig helping it? > Maybe they have a management tool that attempts to totally hide QEMU > from the end user and exposes a different set of machine types. It's > certainly more convenient for something like the Android emulator to > only have to deal with QEMU knowing about the 4 types of machines that > it specifically supports. If it supports four types, it should always pass one of them to qemu. The only thing -nodefconfig adds is breakage when qemu moves something that one of those four machines relies on to a config file. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt
On 03/25/2012 03:26 PM, Anthony Liguori wrote: >>> We would continue to have Westmere/etc in QEMU exposed as part of the >>> user configuration. But I don't think it makes a lot of sense to have >>> to modify QEMU any time a new CPU comes out. >> >> We have to. New features often come with new MSRs which need to be live >> migrated, and of course the cpu flags as well. We may push all these to >> qemu data files, but this is still qemu. We can't let a management tool >> decide that cpu feature X is safe to use on qemu version Y. > > > I think QEMU should own CPU definitions. Agree. > I think a management tool should have the choice of whether they are > used though because they are a policy IMHO. > > It's okay for QEMU to implement some degree of policy as long as a > management tool can override it with a different policy. Sure. We can have something like # default machine's westmere qemu -cpu westmere # pc-1.0's westmere qemu -M pc-1.0 -cpu westmere # pc-1.0's westmere, without nx-less qemu -M pc-1.0 -cpu westmere,-nx # specify everything in painful detail qemu -cpu vendor=Foo,family=17,model=19,stepping=3,maxleaf=12,+fpu,+vme,leaf10eax=0x1234567,+etc -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt
On 03/25/2012 05:30 PM, Anthony Liguori wrote: > On 03/25/2012 10:18 AM, Avi Kivity wrote: >> On 03/25/2012 05:07 PM, Anthony Liguori wrote: >>>> As log as qemu -nodefconfig -cpu westmere -M pc1.1 >>> >>> >>> -nodefconfig is going to eventually mean that -cpu westmere and -M >>> pc-1.1 will not work. >>> >>> This is where QEMU is going. There is no reason that a normal user >>> should ever use -nodefconfig. >> >> I don't think anyone or anything can use it, since it's meaning is not >> well defined. "not read any configuration files" where parts of qemu >> are continually moved out to configuration files means its a moving >> target. > > I think you assume that all QEMU users care about forward and > backwards compatibility on the command line about all else. > > That's really not true. The libvirt folks have stated repeatedly that > command line backwards compatibility is not critical to them. They > are happy to require that a new version of QEMU requires a new version > of libvirt. I don't think this came out of happiness, but despair. Seriously, keeping compatibility is one of the things we work hardest to achieve, and we can't manage it for our command line? > > I'm not saying that backwards compat isn't important--it is. But > there are users who are happy to live on the bleeding edge. That's fine, but I don't see how -nodefconfig helps them. All it does is take away the building blocks (definitions) that they can use when setting up their configuration. > >> Suppose we define the southbridge via a configuration file. Does that >> mean we don't load it any more? > > Yes. If I want the leanest and meanest version of QEMU that will > start in the smallest number of milliseconds, then being able to tell > QEMU not to load configuration files and create a very specific > machine is a Good Thing. Why exclude users from being able to do this? So is this the point? Reducing startup time? I can't say I see the reason to invest so much effort in shaving a millisecond or less from this, but if we did want to, the way would be lazy loading of the configuration where items are parsed as they are referenced. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt
On 03/25/2012 05:26 PM, Anthony Liguori wrote: >> Put the emphasis around *configuration*. > > > So how about: > > 1) Load ['@SYSCONFDIR@/qemu/qemu.cfg', > '@SYSCONFDIR@/qemu/target-@ARCH@.cfg', > '@DATADIR@/system.cfg', '@DATADIR@/system-@ARCH@.cfg'] > > 2) system-@ARCH@.cfg will contain: > > [system] > readconfig=@DATADIR@/target-@a...@-cpus.cfg > readconfig=@DATADIR@/target-@a...@-machine.cfg > > 3) -nodefconfig will not load any configuration files from DATADIR or > SYSCONFDIR. -no-user-config will not load any configuration files > from SYSCONFDIR. What, more options? I don't think -nodefconfig (as defined) is usable, since there is no way for the user to tell what it means short of reading those files. -no-user-config is usable, I think it needs also to mean that qemu without -M/-cpu/-m options will error out? since the default machine/cpu types are default configuration. > >> "#define westmere blah" is not configuration, otherwise the meaning of >> configuration will drift over time. >> >> -cpu blah is, of course. > > It's the same mechanism, but the above would create two classes of > default configuration files and then it becomes a question of how > they're used. Confused. > The file defines westmere as an alias for a grab bag of options. Whether it's loaded or not is immaterial, unless someone uses one of the names within. >>> >>> But you would agree, a management tool should be able to control >>> whether class factories get loaded, right? >> >> No, why? But perhaps I don't entirely get what you mean by "class >> factories". >> >> Aren't they just implementations of >> >> virtual Device *new_instance(...) = 0? >> >> if so, why not load them? > > No, a class factory creates a new type of class. -cpudef will > ultimately call type_register() to create a new QOM visible type. > From a management tools perspective, the type is no different than a > built-in type. Exactly. The types are no different, so there's no reason to discriminate against types that happen to live in qemu-provided data files vs. qemu code. They aren't instantiated, so we lose nothing by creating the factories (just so long as the factories aren't mass-producing objects). > Otherwise, the meaning of -nodefconfig changes as more stuff is moved out of .c and into .cfg. >>> >>> What's the problem with this? >> >> The command line becomes unstable if you use -nodefconfig. > > -no-user-config solves this but I fully expect libvirt would continue > to use -nodefconfig. I don't see how libvirt can use -nodefconfig with the fluid meaning you attach to it, or what it gains from it. >> >> -nodefconfig = create an empty machine, don't assume anything (=don't >> read qemu.cfg) let me build it out of all those lego bricks. Those can >> be defined in code or in definition files in /usr/share, I don't care. >> >> Maybe that's -nodevices -vga none. But in this case I don't see the >> point in -nodefconfig. Not loading target_x86-64.cfg doesn't buy the >> user anything, since it wouldn't affect the guest in any way. > > > -nodefconfig doesn't mean what you think it means. -nodefconfig > doesn't say anything about the user visible machine. > > -nodefconfig tells QEMU not to read any configuration files at start > up. This has an undefined affect on the user visible machine that > depends on the specific version of QEMU. Then it's broken. How can anyone use something that has an undefined effect? If I see something like -nodefconfig, I assume it will create a bare bones guest that will not depend on any qemu defaults and will be stable across releases. I don't think anyone will understand -nodefconfig to be something version dependent without reading the qemu management tool author's guide. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt
On 03/25/2012 05:07 PM, Anthony Liguori wrote: >> As log as qemu -nodefconfig -cpu westmere -M pc1.1 > > > -nodefconfig is going to eventually mean that -cpu westmere and -M > pc-1.1 will not work. > > This is where QEMU is going. There is no reason that a normal user > should ever use -nodefconfig. I don't think anyone or anything can use it, since it's meaning is not well defined. "not read any configuration files" where parts of qemu are continually moved out to configuration files means its a moving target. Suppose we define the southbridge via a configuration file. Does that mean we don't load it any more? -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt
On 03/25/2012 04:59 PM, Anthony Liguori wrote: > On 03/25/2012 09:46 AM, Avi Kivity wrote: >> On 03/25/2012 04:36 PM, Anthony Liguori wrote: >>>> Apart from the command line length, it confuses configuration with >>>> definition. >>> >>> >>> There is no distinction with what we have today. Our configuration >>> file basically corresponds to command line options and as there is no >>> distinction in command line options, there's no distinction in the >>> configuration format. >> >> We don't have command line options for defining, only configuring. > > That's an oversight. There should be a -cpudef option. It's a > QemuOptsList. > >> Again, defining = #define > > I think -global fits your definition of #define... Yes (apart from the corner case of modifying a default-instantiated device). >>> B) A management tool has complete control over cpu definitions without >>> modifying the underlying filesystem. -nodefconfig will prevent it >>> from loading and the management tool can explicitly load the QEMU >>> definition (via -readconfig, potentially using a /dev/fd/N path) or it >>> can define it's own cpu definitions. >> >> Why does -nodefconfig affect anything? > > > Because -nodefconfig means "don't load *any* default configuration > files". Put the emphasis around *configuration*. "#define westmere blah" is not configuration, otherwise the meaning of configuration will drift over time. -cpu blah is, of course. > >> The file defines westmere as an alias for a grab bag of options. >> Whether it's loaded or not is immaterial, unless someone uses one of the >> names within. > > But you would agree, a management tool should be able to control > whether class factories get loaded, right? No, why? But perhaps I don't entirely get what you mean by "class factories". Aren't they just implementations of virtual Device *new_instance(...) = 0? if so, why not load them? > So what's the mechanism to do this? > >>> C) This model maps to any other type of class factory. Machines will >>> eventually be expressed as a class factory. When we implement this, >>> we would change the default target-x86_64-cpu.cfg to: >>> >>> [system] >>> # Load default CPU definitions >>> readconfig = @DATADIR@/target-x86_64-cpus.cfg >>> # Load default machines >>> readconfig = @DATADIR@/target-x86_64-machines.cfg >>> >>> A machine definition would look like: >>> >>> [machinedef] >>> name = pc-0.15 >>> virtio-blk.class_code = 32 >>> ... >>> >>> Loading a file based on -cpu doesn't generalize well unless we try to >>> load a definition for any possible QOM type to find the class factory >>> for it. I don't think this is a good idea. >> >> Why not load all class factories? Just don't instantiate any objects. > > Unless we have two different config syntaxes, I think it will lead to > a lot of confusion. Having some parts of a config file be parsed and > others not is fairly strange. Parse all of them (and make sure all are class factories). The only real configuration item is that without -nodefconfig, we create a -M pc-1.1 system. Everything else derives from that. > >> Otherwise, the meaning of -nodefconfig changes as more stuff is moved >> out of .c and into .cfg. > > What's the problem with this? The command line becomes unstable if you use -nodefconfig. >>> >>> In my target-$(ARCH).cfg, I have: >>> >>> [machine] >>> enable-kvm = "on" >>> >>> Which means I don't have to use -enable-kvm anymore. But if you look >>> at a tool like libguestfs, start up time is the most important thing >>> so avoiding unnecessary I/O and processing is critical. >> >> So this is definitely configuration (applies to the current instance) as >> opposed to target-x86_64.cfg, which doesn't. > > > I'm not sure which part you're responding to.. I was saying that target-x86_64.cfg appears to be definitions, not configuration, and was asking about qemu.cfg (which is configuration). >> As far as I can tell, the only difference is that -nodefconfig -cpu >> westmere will error out instead of working. But if you don't supply >> -cpu westmere, the configuration is identical. > > What configuration? > > Let me ask, what do you think the semantics of -nodefconfig should > be? I'm not sure I understand what you're advocating for. > -nodefconfig = create an empty machine, don't assume anything (=don't read qemu.cfg) let me build it out of all those lego bricks. Those can be defined in code or in definition files in /usr/share, I don't care. Maybe that's -nodevices -vga none. But in this case I don't see the point in -nodefconfig. Not loading target_x86-64.cfg doesn't buy the user anything, since it wouldn't affect the guest in any way. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt
On 03/25/2012 04:36 PM, Anthony Liguori wrote: >> Apart from the command line length, it confuses configuration with >> definition. > > > There is no distinction with what we have today. Our configuration > file basically corresponds to command line options and as there is no > distinction in command line options, there's no distinction in the > configuration format. We don't have command line options for defining, only configuring. Again, defining = #define Configuring = modifying current instance > >> target-x86_64-cpus.cfg does not configure qemu for anything, it's merely >> the equivalent of >> >>#define westmere (x86_def_t) { ... } >>#define nehalem (x86_def_t) { ... } >>#define bulldozer (x86_def_t) { ... } // for PC >> >> so it should be read at each invocation. On the other hand, pc.cfg and >> westmere.cfg (as used previously) are shorthand for >> >> machine = (QEMUMachine) { ... }; >> cpu = (x86_def_t) { ... }; >> >> so they should only be read if requested explicitly (or indirectly). > > This doesn't make a lot of sense to me. Here's what I'm proposing: > > 1) QEMU would have a target-x86_64-cpu.cfg.in that is installed by > default in /etc/qemu. It would contain: > > [system] > # Load default CPU definitions > readconfig = @DATADIR@/target-x86_64-cpus.cfg > > 2) target-x86_64-cpus.cfg would be installed to @DATADIR@ and would > contain: > > [cpudef] > name = "Westmere" > ... > > This has the following properties: > > A) QEMU has no builtin notion of CPU definitions. It just has a "cpu > factory". -cpudef will create a new class called Westmere that can > then be enumerated through qom-type-list and created via qom-create. > > B) A management tool has complete control over cpu definitions without > modifying the underlying filesystem. -nodefconfig will prevent it > from loading and the management tool can explicitly load the QEMU > definition (via -readconfig, potentially using a /dev/fd/N path) or it > can define it's own cpu definitions. Why does -nodefconfig affect anything? The file defines westmere as an alias for a grab bag of options. Whether it's loaded or not is immaterial, unless someone uses one of the names within. > > C) This model maps to any other type of class factory. Machines will > eventually be expressed as a class factory. When we implement this, > we would change the default target-x86_64-cpu.cfg to: > > [system] > # Load default CPU definitions > readconfig = @DATADIR@/target-x86_64-cpus.cfg > # Load default machines > readconfig = @DATADIR@/target-x86_64-machines.cfg > > A machine definition would look like: > > [machinedef] > name = pc-0.15 > virtio-blk.class_code = 32 > ... > > Loading a file based on -cpu doesn't generalize well unless we try to > load a definition for any possible QOM type to find the class factory > for it. I don't think this is a good idea. Why not load all class factories? Just don't instantiate any objects. Otherwise, the meaning of -nodefconfig changes as more stuff is moved out of .c and into .cfg. > The reasoning is, loading target-x86_64-cpus.cfg does not alter the current instance's configuration, so reading it doesn't violate -nodefconfig. >>> >>> I think we have a different view of what -nodefconfig does. >>> >>> We have a couple options today: >>> >>> -nodefconfig >>> >>> Don't read the default configuration files. By default, we read >>> /etc/qemu/qemu.cfg and /etc/qemu/target-$(ARCH).cfg >>> >> >> The latter seems meaningless to avoid reading. It's just a set of >> #defines, what do you get by not reading it? > > In my target-$(ARCH).cfg, I have: > > [machine] > enable-kvm = "on" > > Which means I don't have to use -enable-kvm anymore. But if you look > at a tool like libguestfs, start up time is the most important thing > so avoiding unnecessary I/O and processing is critical. So this is definitely configuration (applies to the current instance) as opposed to target-x86_64.cfg, which doesn't. > >>> -nodefaults >>> >>> Don't create default devices. >>> >>> -vga none >>> >>> Don't create the default VGA device (not covered by -nodefaults). >>> >>> With these two options, the semantics you get an absolutely >>> minimalistic instance of QEMU. Tools like libguestfs really want to >>> create the simplest guest and do the least amount of processing so the >>> guest runs as fast as possible. >>> >>> It does suck a lot that this isn't a single option. I would much >>> prefer -nodefaults to be implied by -nodefconfig. Likewise, I would >>> prefer that -nodefaults implied -vga none. >> >> I don't have a qemu.cfg so can't comment on it, but in what way does >> reading target-x86_64.cfg affect the current instance (that is, why is >> -nodefconfig needed over -nodefaults -vga look-at-the-previous-option?) > > It depends on what the user configures it to do. How? As far as I can tell, the only difference is that -nodefconfig -cpu westmere will error out instead of working. But if
Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt
On 03/25/2012 03:22 PM, Anthony Liguori wrote: In that case qemu -cpu westmere is shorthand for -readconfig /usr/share/qemu/cpus/westmere.cfg. >>> >>> >>> This is not a bad suggestion, although it would make -cpu ? a bit >>> awkward. Do you see an advantage to this over having >>> /usr/share/qemu/target-x86_64-cpus.cfg that's read early on? >> >> Nope. As long as qemu -nodefconfig -cpu westmere works, I'm happy. > > > Why? What's wrong with: > > qemu -nodefconfig -readconfig > /usr/share/qemu/cpus/target-x86_64-cpus.cfg \ > -cpu westmere > > And if that's not okay, would: > > qemu -nodefconfig -nocpudefconfig -cpu Westmere > > Not working be a problem? Apart from the command line length, it confuses configuration with definition. target-x86_64-cpus.cfg does not configure qemu for anything, it's merely the equivalent of #define westmere (x86_def_t) { ... } #define nehalem (x86_def_t) { ... } #define bulldozer (x86_def_t) { ... } // for PC so it should be read at each invocation. On the other hand, pc.cfg and westmere.cfg (as used previously) are shorthand for machine = (QEMUMachine) { ... }; cpu = (x86_def_t) { ... }; so they should only be read if requested explicitly (or indirectly). > >> The reasoning is, loading target-x86_64-cpus.cfg does not alter the >> current instance's configuration, so reading it doesn't violate >> -nodefconfig. > > I think we have a different view of what -nodefconfig does. > > We have a couple options today: > > -nodefconfig > > Don't read the default configuration files. By default, we read > /etc/qemu/qemu.cfg and /etc/qemu/target-$(ARCH).cfg > The latter seems meaningless to avoid reading. It's just a set of #defines, what do you get by not reading it? > -nodefaults > > Don't create default devices. > > -vga none > > Don't create the default VGA device (not covered by -nodefaults). > > With these two options, the semantics you get an absolutely > minimalistic instance of QEMU. Tools like libguestfs really want to > create the simplest guest and do the least amount of processing so the > guest runs as fast as possible. > > It does suck a lot that this isn't a single option. I would much > prefer -nodefaults to be implied by -nodefconfig. Likewise, I would > prefer that -nodefaults implied -vga none. I don't have a qemu.cfg so can't comment on it, but in what way does reading target-x86_64.cfg affect the current instance (that is, why is -nodefconfig needed over -nodefaults -vga look-at-the-previous-option?) > > files be read by default or just treated as additional configuration > files. If they're read as soon as they're referenced, what's the difference? >>> I think the thread has reduced to: should /usr/share configuration >>> >>> I suspect libvirt would not be happy with reading configuration files >>> on demand.. >> >> Why not? > > It implies a bunch of SELinux labeling to make sVirt work. libvirt > tries very hard to avoid having QEMU read *any* files at all when it > starts up. The /usr/share/qemu files should be statically labelled to allow qemu to read them, so we can push more code into data files. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt
On 03/11/2012 04:12 PM, Anthony Liguori wrote: >> Let me elaborate about the later. Suppose host CPU has kill_guest >> feature and at the time a guest was installed it was not implemented by >> kvm. Since it was not implemented by kvm it was not present in vcpu >> during installation and the guest didn't install "workaround kill_guest" >> module. Now unsuspecting user upgrades the kernel and tries to restart >> the guest and fails. He writes angry letter to qemu-devel and is >> asked to >> reinstall his guest and move along. > > > -cpu best wouldn't solve this. You need a read/write configuration > file where QEMU probes the available CPU and records it to be used for > the lifetime of the VM. This doesn't work with live migration, and makes templating harder. The only persistent storage we can count on are disk images. The current approach is simple. The management tool determines the configuration, qemu applies it. Unidirectional information flow. This also lends itself to the management tool scanning a cluster and determining a GCD. > This discussion isn't about whether QEMU should have a Westmere > processor definition. In fact, I think I already applied that patch. > > It's a discussion about how we handle this up and down the stack. > > The question is who should define and manage CPU compatibility. Right > now QEMU does to a certain degree, libvirt discards this and does it's > own thing, and VDSM/ovirt-engine assume that we're providing something > and has built a UI around it. > > What I'm proposing we consider: have VDSM manage CPU definitions in > order to provide a specific user experience in ovirt-engine. > > We would continue to have Westmere/etc in QEMU exposed as part of the > user configuration. But I don't think it makes a lot of sense to have > to modify QEMU any time a new CPU comes out. We have to. New features often come with new MSRs which need to be live migrated, and of course the cpu flags as well. We may push all these to qemu data files, but this is still qemu. We can't let a management tool decide that cpu feature X is safe to use on qemu version Y. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt
On 03/25/2012 03:12 PM, Anthony Liguori wrote: >>> qemu -M pc >>> >>> Would effectively be short hand for -readconfig >>> /usr/share/qemu/machines/pc.cfg >> >> In that case >> >> qemu -cpu westmere >> >> is shorthand for -readconfig /usr/share/qemu/cpus/westmere.cfg. > > > This is not a bad suggestion, although it would make -cpu ? a bit > awkward. Do you see an advantage to this over having > /usr/share/qemu/target-x86_64-cpus.cfg that's read early on? Nope. As long as qemu -nodefconfig -cpu westmere works, I'm happy. The reasoning is, loading target-x86_64-cpus.cfg does not alter the current instance's configuration, so reading it doesn't violate -nodefconfig. >>> files be read by default or just treated as additional configuration >>> files. >> >> If they're read as soon as they're referenced, what's the difference? > I think the thread has reduced to: should /usr/share configuration > > I suspect libvirt would not be happy with reading configuration files > on demand.. Why not? -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt
On 03/25/2012 02:55 PM, Anthony Liguori wrote: >> If cpu models are not part of configuration they should not be affected >> by configuration mechanism. You are just avoiding addressing the real >> question that if asked above. > > > I think you're just refusing to listen. > > The stated direction of QEMU, for literally years now, is that we want > to arrive at the following: > > QEMU is composed of a series of objects who's relationships can be > fully described by an external configuration file. Much of the > current baked in concepts (like machines) would then become > configuration files. > > qemu -M pc > > Would effectively be short hand for -readconfig > /usr/share/qemu/machines/pc.cfg In that case qemu -cpu westmere is shorthand for -readconfig /usr/share/qemu/cpus/westmere.cfg. > I think the thread has reduced to: should /usr/share configuration > files be read by default or just treated as additional configuration > files. If they're read as soon as they're referenced, what's the difference? -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
On 11/14/2011 11:58 AM, Kevin Wolf wrote: > Am 12.11.2011 11:25, schrieb Avi Kivity: > > On 11/11/2011 12:15 PM, Kevin Wolf wrote: > >> Am 10.11.2011 22:30, schrieb Anthony Liguori: > >>> Live migration with qcow2 or any other image format is just not going to > >>> work > >>> right now even with proper clustered storage. I think doing a block > >>> level flush > >>> cache interface and letting block devices decide how to do it is the best > >>> approach. > >> > >> I would really prefer reusing the existing open/close code. It means > >> less (duplicated) code, is existing code that is well tested and doesn't > >> make migration much of a special case. > >> > >> If you want to avoid reopening the file on the OS level, we can reopen > >> only the topmost layer (i.e. the format, but not the protocol) for now > >> and in 1.1 we can use bdrv_reopen(). > > > > Intuitively I dislike _reopen style interfaces. If the second open > > yields different results from the first, does it invalidate any > > computations in between? > > Not sure what results and what computation you mean, Result = open succeeded. Computation = anything that derives from the image, like size, or reading some stuff to guess CHS or something. > but let me clarify > a bit about bdrv_reopen: > > The main purpose of bdrv_reopen() is to change flags, for example toggle > O_SYNC during runtime in order to allow the guest to toggle WCE. This > doesn't necessarily mean a close()/open() sequence if there are other > means to change the flags, like fcntl() (or even using other protocols > than files). > > The idea here was to extend this to invalidate all caches if some > specific flag is set. As you don't change any other flag, this will > usually not be a reopen on a lower level. > > If we need to use open() though, and it fails (this is really the only > "different" result that comes to mind) (yes) > then bdrv_reopen() would fail and > the old fd would stay in use. Migration would have to fail, but I don't > think this case is ever needed for reopening after migration. Okay. > > > What's wrong with just delaying the open? > > Nothing, except that with today's code it's harder to do. > This has never stopped us (though it may delay us). -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
On 11/12/2011 03:39 PM, Anthony Liguori wrote: > On 11/12/2011 04:27 AM, Avi Kivity wrote: >> On 11/11/2011 04:03 PM, Anthony Liguori wrote: >>> >>> I don't view not supporting migration with image formats as a >>> regression as it's never been a feature we've supported. While there >>> might be confusion about support around NFS, I think it's always been >>> clear that image formats cannot be used. >> >> Was there ever a statement to that effect? It was never clear to me and >> I doubt it was clear to anyone. > > You literally reviewed a patch who's subject was "block: allow > migration to work with image files"[1] that explained in gory detail > what the problem was. > > [1] http://mid.gmane.org/4c8cad7c.5020...@redhat.com > Isn't a patch fixing a problem with migrating image files a statement that we do support migrating image files? >> >>> >>> Given that, I don't think this is a candidate for 1.0. >>> >> >> Let's just skip 1.0 and do 1.1 instead. > > Let's stop being overly dramatic. You know as well as anyone that > image format support up until the coroutine conversion has had enough > problems that no one could practically be using them in a production > environment. They are used in production environments. > > Live migration is an availability feature. Up until the 1.0 release, > if you cared about availability and correctness, you would not be > using an image format. > Nevertheless, people who care about both availability and correctness, do use image formats. In reality, migration and image formats are critical features for virtualization workloads. Pretending they're not makes the 1.0 release a joke. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
On 11/11/2011 12:15 PM, Kevin Wolf wrote: > Am 10.11.2011 22:30, schrieb Anthony Liguori: > > Live migration with qcow2 or any other image format is just not going to > > work > > right now even with proper clustered storage. I think doing a block level > > flush > > cache interface and letting block devices decide how to do it is the best > > approach. > > I would really prefer reusing the existing open/close code. It means > less (duplicated) code, is existing code that is well tested and doesn't > make migration much of a special case. > > If you want to avoid reopening the file on the OS level, we can reopen > only the topmost layer (i.e. the format, but not the protocol) for now > and in 1.1 we can use bdrv_reopen(). > Intuitively I dislike _reopen style interfaces. If the second open yields different results from the first, does it invalidate any computations in between? What's wrong with just delaying the open? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions
On 11/11/2011 04:03 PM, Anthony Liguori wrote: > > I don't view not supporting migration with image formats as a > regression as it's never been a feature we've supported. While there > might be confusion about support around NFS, I think it's always been > clear that image formats cannot be used. Was there ever a statement to that effect? It was never clear to me and I doubt it was clear to anyone. > > Given that, I don't think this is a candidate for 1.0. > Let's just skip 1.0 and do 1.1 instead. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4] XBZRLE delta for live migration of large memory apps
On 08/11/2011 12:16 PM, Daniel P. Berrange wrote: On Thu, Aug 11, 2011 at 11:17:09AM +0300, Avi Kivity wrote: > On 08/10/2011 10:27 PM, Anthony Liguori wrote: > >>This may be acceptable, wait until the entire migration cluster is > >>xzbrle capable before enabling it. If not, add a monitor command. > > > > > >1) xzbrle needs to be disabled by default. That way management > >tools don't unknowingly enable it by not passing -no-xzbrle. > > We could hook it to -M, though it's a bit gross. That would needlessly prevent its use for any existing installed guests with a older machine type, which are running in a new QEMU You could still enable it explicitly; I'm just trying to get it to be enabled by default. Some kind of monitor capabilities seems good to me. Live migration is probably mostly done in managed environments, so I think you're right. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4] XBZRLE delta for live migration of large memory apps
On 08/10/2011 10:27 PM, Anthony Liguori wrote: This may be acceptable, wait until the entire migration cluster is xzbrle capable before enabling it. If not, add a monitor command. 1) xzbrle needs to be disabled by default. That way management tools don't unknowingly enable it by not passing -no-xzbrle. We could hook it to -M, though it's a bit gross. Otherwise we need to document this clearly in the management tool author's guide. 3) a management tool should be able to query the source and destination, and then enable xzbrle if both sides support it. You can argue that (3) could be static. A command could be added to toggle it dynamically through the monitor. But no matter what, someone has to touch libvirt and any other tool that works with QEMU to make this thing work. But this is a general problem. Any optional change to the migration protocol has exactly the same characteristics whether it's XZBRLE, XZBRLE v2 (if there is a v2), ASN.1, or any other form of compression that rolls around. If we have two-way communication we can do this transparently in the protocol itself. Instead of teaching management tools how to deal with all of these things, let's just fix this problem once. It just takes: a) A query-migration-caps command that returns a dict with two lists of strings. Something like: { 'execute': 'query-migration-caps' } { 'return' : { 'capabilities': [ 'xbzrle' ], 'current': [] } } b) A set-migration-caps command that takes a list of strings. It simply takes the intersection of the capabilities set with the argument and sets the current set to the result. Something like: { 'execute': 'set-migration-caps', 'arguments': { 'set': [ 'xbzrle' ] }} { 'return' : {} } c) An internal interface to register a capability and an internal interface to check if a capability is currently enabled. The xzbrle code just needs to disable itself if the capability isn't set. Then we teach libvirt (and other tools) to query the caps list on the source, set the destination, query the current set on the destination, and then set that set on the source. This is only if the capability has no side effect. As we introduce new things, like the next great compression protocol, or ASN.1, we don't need to touch libvirt again. libvirt can still know about the caps and selectively override QEMU if it's so inclined but it prevents us from reinventing the same mechanisms over and over again. Right. Yes. But that negotiation needs to become part of the "protocol" for migration. In the absence of that negotiation, we need to use the wire protocol we use today. We cannot have ad-hoc feature negotiation for every change we make to the wire protocol. Okay, as long as we have someone willing to implement it. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4] XBZRLE delta for live migration of large memory apps
On 08/08/2011 05:04 PM, Daniel P. Berrange wrote: My main concern with all these scenarios where libvirt touches the actual data stream though is that we're introducing extra data copies into the migration path which potentially waste CPU cycles. If QEMU can directly XBZRLE encode data into the FD passed via 'fd:' then we minimize data copies. Whether this is a big enough benefit to offset the burden of having to maintain various compression code options in QEMU I can't answer. It's counterproductive to force an unneeded data copy in order to increase bandwidth. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4] XBZRLE delta for live migration of large memory apps
On 08/08/2011 05:33 PM, Anthony Liguori wrote: If we have a shared object helper, the thread should be maintained by qemu proper, not the plugin. I wouldn't call it "migration transport", but instead a compression/decompression plugin. I don't think it merits a plugin at all though. There's limited scope for compression and it best sits in qemu proper. If anything, it needs to be more integrated (for example turning itself off if it doesn't match enough). That adds a tremendous amount of complexity to QEMU. Tremendous? You exaggerate. It's a lot simpler than the block or char layers, for example. If we're going to change our compression algorithm, we would need to use a single algorithm that worked well for a wide variety of workloads. That algorithm will have to include XBZRLE as a subset, since it matches what workloads actually do (touch memory sparsely). We struggle enough with migration as it is, it only would get worse if we have 10 different algorithms that we were dynamically enabling/disabling. The other option is to allow 1-off compression algorithms in the form of plugins. I think in this case, plugins are a pretty good compromise in terms of isolating complexity while allowing something that at least works very well for one particular type of workload. I think you underestimate the generality of XBZRLE (or maybe I'm overestimating it?). It's not reasonable to ask users to match a compression algorithm to their workload; most times they won't be interacting with the host at all. We need compression to be enabled at all time, turning itself off if it finds it isn't effective so it can consume less cpu. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4] XBZRLE delta for live migration of large memory apps
On 08/08/2011 05:15 PM, Anthony Liguori wrote: I think workload aware migration compression is possible for a lot of different types of workloads. That makes me a bit wary of QEMU growing quite a lot of compression mechanisms. It makes me think that this logic may really belong at a higher level where more information is known about the workload. For instance, I can imagine XBZRLE living in something like libvirt. A better model would be plugin based. exec helpers are plugins. They just live in a different address space and a channel to exchange data (pipe). libvirt isn't an exec helper. If we did .so plugins, which I'm really not opposed to, I'd want the interface to be something like: typedef struct MigrationTransportClass { ssize_t (*writev)(MigrationTransport *obj, struct iovec *iov, int iovcnt); } MigrationTransportClass; I think it's useful to use an interface like this because it makes it easy to put the transport in a dedicated thread that didn't hold qemu_mutex (which is sort of equivalent to using a fork'd helper but is zero-copy at the expense of less isolation). If we have a shared object helper, the thread should be maintained by qemu proper, not the plugin. I wouldn't call it "migration transport", but instead a compression/decompression plugin. I don't think it merits a plugin at all though. There's limited scope for compression and it best sits in qemu proper. If anything, it needs to be more integrated (for example turning itself off if it doesn't match enough). -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4] XBZRLE delta for live migration of large memory apps
On 08/08/2011 04:41 PM, Alexander Graf wrote: In general, I believe it's a good idea to keep looking at libvirt as a vm management layer and only a vm management layer. Very much yes. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v4] XBZRLE delta for live migration of large memory apps
On 08/08/2011 04:29 PM, Anthony Liguori wrote: One thing that strikes me about this algorithm is that it's very good for a particular type of workload--shockingly good really. Poking bytes at random places in memory is fairly generic. If you have a lot of small objects, and modify a subset of them, this is the pattern you get. I think workload aware migration compression is possible for a lot of different types of workloads. That makes me a bit wary of QEMU growing quite a lot of compression mechanisms. It makes me think that this logic may really belong at a higher level where more information is known about the workload. For instance, I can imagine XBZRLE living in something like libvirt. A better model would be plugin based. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] live snapshot wiki updated
On 07/20/2011 04:51 PM, Kevin Wolf wrote: > > The problem is that QEMU will find backing file file names inside the > images which it will be unable to open. How do you suggest we get around > that? This is the part with allowing libvirt to override the backing file. Of course, this is not something that we can add with five lines of code, it requires -blockdev. It can be done without blockdev. Have a dictionary that translates filenames, and populate it from the command line (for a bonus, translate a filename to a file descriptor inherited from the caller or passed via the monitor). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v2] Add support for fd: protocol
On 06/20/2011 10:11 PM, Anthony Liguori wrote: It would need careful explanation in the management tool author's guide, yes. The main advantage is generality. It doesn't assume that a file format has just one backing file, and doesn't require new syntax wherever a file is referred to indirectly. FWIW, with blockdev, we need options to control this all anyway. If you go back to my QCFG proposal, the parameters would actually be format specific, so if we had: -block file=fd:4,format=fancypantsformat,part0=hd0-back.part1,part1=hd0-back.part2... Yeah. We either name the formal argument (your proposal) or the actual argument (mine). -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v2] Add support for fd: protocol
On 06/20/2011 04:50 PM, Anthony Liguori wrote: On 06/20/2011 08:40 AM, Avi Kivity wrote: On 06/14/2011 04:31 PM, Corey Bryant wrote: - Starting Qemu with a backing file For this we could tell qemu that a file named "xyz" is available via fd n, via an extension of the getfd command. For example (qemu) getfd path="/images/my-image.img" (qemu) getfd path="/images/template.img" (qemu) drive-add path="/images/my-image.img" The open() for my-image.img first looks up the name in the getfd database, and finds it, so it returns the fd from there instead of opening. It then opens the backing file ("template.img") and looks it up again, and finds the second fd from the session. The way I've been thinking about this is: -blockdev id=hd0-back,file=fd:4,format=raw \ -blockdev file=fd:3,format=qcow2,backing=hd0-back While your proposal is clever, it makes me a little nervous about subtle security ramifications. It would need careful explanation in the management tool author's guide, yes. The main advantage is generality. It doesn't assume that a file format has just one backing file, and doesn't require new syntax wherever a file is referred to indirectly. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v2] Add support for fd: protocol
On 06/14/2011 04:31 PM, Corey Bryant wrote: - Starting Qemu with a backing file For this we could tell qemu that a file named "xyz" is available via fd n, via an extension of the getfd command. For example (qemu) getfd path="/images/my-image.img" (qemu) getfd path="/images/template.img" (qemu) drive-add path="/images/my-image.img" The open() for my-image.img first looks up the name in the getfd database, and finds it, so it returns the fd from there instead of opening. It then opens the backing file ("template.img") and looks it up again, and finds the second fd from the session. The result is that open()s are satisfied from the monitor, instead of the host kernel, but without reversing the request/reply nature of the monitor protocol. A similar extension could be added to the command line: qemu -drive file=fd:4,cache=none -path-alias name=/images/template.img,path=fd:5 Here the main image is opened via a fd 4; if it needs template.img, it gets shunted to fd 5. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v2 3/3] raw-posix: Re-open host CD-ROM after media change
On 04/05/2011 12:12 PM, Amit Shah wrote: On (Tue) 05 Apr 2011 [12:00:38], Avi Kivity wrote: > On 04/05/2011 11:09 AM, Amit Shah wrote: > >On (Tue) 05 Apr 2011 [10:48:16], Avi Kivity wrote: > >> On 04/05/2011 09:41 AM, Amit Shah wrote: > >> >See http://www.spinics.net/lists/linux-scsi/msg51504.html > >> > >> I see this is quite fresh. What are the plans here? > > > >We're still discussing where the fix should be, but it certainly is a > >kernel bug and should be fixed there, and then applied to stable. > > > >However, there are other bugs in qemu which will prevent the right > >size changes to be visible in the guest (the RFC series I sent out > >earlier in this thread need to be applied to QEMU at the least, the > >series has grown in my development tree since the time I sent that one > >out). So essentially we need to update both, the hypervisor and the > >guest to get proper CDROM media change support. > > Why do we need to update the guest for a qemu bug? What is the qemu bug? Guest kernel bug: CDROM change event missed, so the the revalidate call isn't made, which causes stale data (like disc size) to be used on newer media. qemu bug: We don't handle the GET_EVENT_STATUS_NOTIFICATION command from guests (which is a mandatory command acc. to scsi spec) which the guest uses to detect CDROM changes. Once this command is implemented, QEMU sends the required info the guest needs to detect CDROM changes. I have this implemented locally (also sent as RFC PATCH 2/3 in the 'cdrom bug roundup' thread. So: even if qemu is updated to handle this command, the guest won't work correctly since it misses the event. Okay. We aren't responsible for guest kernel bugs, especially those which apply to real hardware (we should make more effort for virtio bugs). It's enough that we fix qemu here. > >It also looks like we can't have a workaround in QEMU to get older > >guests to work. > > Older guests? or older hosts? Older guests (not patched with fix for the bug described above). Since the guest kernel completely misses the disc change event in the path that does the revalidation, there's nothing qemu can do that will make such older guests notice disc change. Also: if only the guest kernel is updated by qemu is not, things still won't work since qemu will never send valid information for the GET_EVENT_STATUS_NOTIFICATION command. > >However, a hack in the kernel can be used without any QEMU changes > >(revalidate disk on each sr_open() call, irrespective of detecting any > >media change). I'm against doing that for upstream, but downstreams > >could do that for new guest - old hypervisor compat. > > Seriously confused. Please use the kernels "host kernel" and "qemu" > instead of "hypervisor" which is ambiguous. OK: this last bit says that forcefully revalidating discs in the guest kernel when a guest userspace opens the disc will ensure size changes are reflected properly for guest userspace. So in this case, even if we're using an older qemu which doesn't implement GET_EVENT_STATUS_NOTIFICATION, guest userspace apps will work fine. This is obviously a hack. Yes. Thanks for the clarification. (let's see if I really got it - we have a kernel bug that hit both the guest and the host, plus a qemu bug?) -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v2 3/3] raw-posix: Re-open host CD-ROM after media change
On 04/05/2011 11:09 AM, Amit Shah wrote: On (Tue) 05 Apr 2011 [10:48:16], Avi Kivity wrote: > On 04/05/2011 09:41 AM, Amit Shah wrote: > >See http://www.spinics.net/lists/linux-scsi/msg51504.html > > I see this is quite fresh. What are the plans here? We're still discussing where the fix should be, but it certainly is a kernel bug and should be fixed there, and then applied to stable. However, there are other bugs in qemu which will prevent the right size changes to be visible in the guest (the RFC series I sent out earlier in this thread need to be applied to QEMU at the least, the series has grown in my development tree since the time I sent that one out). So essentially we need to update both, the hypervisor and the guest to get proper CDROM media change support. Why do we need to update the guest for a qemu bug? What is the qemu bug? It also looks like we can't have a workaround in QEMU to get older guests to work. Older guests? or older hosts? However, a hack in the kernel can be used without any QEMU changes (revalidate disk on each sr_open() call, irrespective of detecting any media change). I'm against doing that for upstream, but downstreams could do that for new guest - old hypervisor compat. Seriously confused. Please use the kernels "host kernel" and "qemu" instead of "hypervisor" which is ambiguous. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v2 3/3] raw-posix: Re-open host CD-ROM after media change
On 04/05/2011 09:41 AM, Amit Shah wrote: See http://www.spinics.net/lists/linux-scsi/msg51504.html I see this is quite fresh. What are the plans here? -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v2 3/3] raw-posix: Re-open host CD-ROM after media change
On 04/04/2011 06:09 PM, Stefan Hajnoczi wrote: On Mon, Apr 4, 2011 at 2:49 PM, Avi Kivity wrote: > On 04/04/2011 04:38 PM, Anthony Liguori wrote: >> >> On 04/04/2011 08:22 AM, Avi Kivity wrote: >>> >>> On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote: >>>> >>>> In order for media change to work with Linux host CD-ROM it is >>>> necessary to reopen the file (otherwise the inode size will not >>>> refresh, this is an issue with existing kernels). >>>> >>> >>> Maybe we should fix the bug in Linux (and backport as necessary)? >>> >>> I think cd-rom assignment is sufficiently obscure that we can require a >>> fixed kernel instead of providing a workaround. >> >> Do reads fail after CD change? Or do they succeed and the size is just >> reported incorrectly? >> >> If it's the later, I'd agree that it needs fixing in the kernel. If it's >> the former, I'd say it's clearly a feature. >> > > Even if it's a documented or intentional feature, we can add an ioctl to > "refresh" the device with up-to-date data. It's possible to fix this in the kernel. I just haven't written the patch yet. The inode size needs to be updated when the new medium is detected. I haven't tested but I suspect reads within the size of the previous medium will succeed. But if the new medium is larger then reads beyond the old medium size will fail. The size reported by lseek(fd, 0, SEEK_END) is outdated. I believe a kernel fix is best in that case, leaving qemu alone. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v2 3/3] raw-posix: Re-open host CD-ROM after media change
On 04/04/2011 04:38 PM, Anthony Liguori wrote: On 04/04/2011 08:22 AM, Avi Kivity wrote: On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote: In order for media change to work with Linux host CD-ROM it is necessary to reopen the file (otherwise the inode size will not refresh, this is an issue with existing kernels). Maybe we should fix the bug in Linux (and backport as necessary)? I think cd-rom assignment is sufficiently obscure that we can require a fixed kernel instead of providing a workaround. Do reads fail after CD change? Or do they succeed and the size is just reported incorrectly? If it's the later, I'd agree that it needs fixing in the kernel. If it's the former, I'd say it's clearly a feature. Even if it's a documented or intentional feature, we can add an ioctl to "refresh" the device with up-to-date data. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] [PATCH v2 3/3] raw-posix: Re-open host CD-ROM after media change
On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote: In order for media change to work with Linux host CD-ROM it is necessary to reopen the file (otherwise the inode size will not refresh, this is an issue with existing kernels). Maybe we should fix the bug in Linux (and backport as necessary)? I think cd-rom assignment is sufficiently obscure that we can require a fixed kernel instead of providing a workaround. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Startup/Shutdown scripts for KVM Machines in Debian (libvirt)
On 11/10/2010 10:01 AM, Hermann Himmelbauer wrote: Hi, I manage my KVM machines via libvirt and wonder if there are any init.d scripts for automatically starting up and shutting down virtual machines during boot/shutdown of the host? Writing this for myself seems to be not that simple, as when shutting down, the system has somehow to wait until all machines are halted (not responding guests have to be destroyed etc.), and I don't really know how to accomplish this. My host system is Debian Lenny, is there anything available? Perhaps libvirt offers something I'm unaware of? I think it does. Copying the libvirt mailing list. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration
On 09/12/2010 07:19 PM, Anthony Liguori wrote: On 09/12/2010 11:45 AM, Avi Kivity wrote: Streaming relies on copy-on-read to do the writing. Ah. You can avoid the copy-on-read implementation in the block format driver and do it completely in generic code. Copy on read takes advantage of temporal locality. You wouldn't want to stream without copy on read because you decrease your idle I/O time by not effectively caching. I meant, implement copy-on-read in generic code side by side with streaming. Streaming becomes just a prefetch operation (read and discard) which lets copy-on-read do the rest. This is essentially your implementation, yes? stream_4(): increment offset if more: bdrv_aio_stream() Of course, need to serialize wrt guest writes, which adds a bit more complexity. I'll leave it to you to code the state machine for that. http://repo.or.cz/w/qemu/aliguori.git/commitdiff/d44ea43be084cc879cd1a33e1a04a105f4cb7637?hp=34ed425e7dd39c511bc247d1ab900e19b8c74a5d Clever - it pushes all the synchronization into the copy-on-read implementation. But the serialization there hardly jumps out of the code. Do I understand correctly that you can only have one allocating read or write running? Cluster allocation, L2 cache allocation, or on-disk L2 allocation? You only have one on-disk L2 allocation at one time. That's just an implementation detail at the moment. An on-disk L2 allocation happens only when writing to a new cluster that requires a totally new L2 entry. Since L2s cover 2GB of logical space, it's a rare event so this turns out to be pretty reasonable for a first implementation. Parallel on-disk L2 allocations is not that difficult, it's just a future TODO. Really, you can just preallocate all L2s. Most filesystems will touch all of them very soon. qcow2 might save some space for snapshots which share L2s (doubtful) or for 4k clusters (historical) but for qed with 64k clusters, it doesn't save any space. Linear L2s will also make your fsck *much* quicker. Size is .01% of logical image size. 1MB for a 10GB guest, by the time you install something on it that's a drop in the bucket. If you install a guest on a 100GB disk, what percentage of L2s are allocated? Generally, I think the block layer makes more sense if the interface to the formats are high level and code sharing is achieved not by mandating a world view but rather but making libraries of common functionality. This is more akin to how the FS layer works in Linux. So IMHO, we ought to add a bdrv_aio_commit function, turn the current code into a generic_aio_commit, implement a qed_aio_commit, then somehow do qcow2_aio_commit, and look at what we can refactor into common code. What Linux does if have an equivalent of bdrv_generic_aio_commit() which most implementations call (or default to), and only do something if they want something special. Something like commit (or copy-on-read, or copy-on-write, or streaming) can be implement 100% in terms of the generic functions (and indeed qcow2 backing files can be any format). Yes, what I'm really saying is that we should take the bdrv_generic_aio_commit() approach. I think we're in agreement here. Strange feeling. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration
On 09/12/2010 05:23 PM, Anthony Liguori wrote: On 09/12/2010 08:40 AM, Avi Kivity wrote: Why would it serialize all I/O operations? It's just like another vcpu issuing reads. Because the block layer isn't re-entrant. A threaded block layer is reentrant. Of course pushing the thing into a thread requires that. What you basically do is: stream_step_three(): complete() stream_step_two(offset, length): bdrv_aio_readv(offset, length, buffer, stream_step_three) bdrv_aio_stream(): bdrv_aio_find_free_cluster(stream_step_two) Isn't there a write() missing somewhere? Streaming relies on copy-on-read to do the writing. Ah. You can avoid the copy-on-read implementation in the block format driver and do it completely in generic code. And that's exactly what the current code looks like. The only change to the patch that this does is make some of qed's internals be block layer interfaces. Why do you need find_free_cluster()? That's a physical offset thing. Just write to the same logical offset. IOW: bdrv_aio_stream(): bdrv_aio_read(offset, stream_2) It's an optimization. If you've got a fully missing L1 entry, then you're going to memset() 2GB worth of zeros. That's just wasted work. With a 1TB image with a 1GB allocation, it's a huge amount of wasted work. Ok. And it's a logical offset, not physical as I thought, which confused me. stream_2(): if all zeros: increment offset if more: bdrv_aio_stream() bdrv_aio_write(offset, stream_3) stream_3(): bdrv_aio_write(offset, stream_4) I don't understand why stream_3() is needed. This implementation doesn't rely on copy-on-read code in the block format driver. It is generic and uses existing block layer interfaces. It would need copy-on-read support in the generic block layer as well. stream_4(): increment offset if more: bdrv_aio_stream() Of course, need to serialize wrt guest writes, which adds a bit more complexity. I'll leave it to you to code the state machine for that. http://repo.or.cz/w/qemu/aliguori.git/commitdiff/d44ea43be084cc879cd1a33e1a04a105f4cb7637?hp=34ed425e7dd39c511bc247d1ab900e19b8c74a5d Clever - it pushes all the synchronization into the copy-on-read implementation. But the serialization there hardly jumps out of the code. Do I understand correctly that you can only have one allocating read or write running? Parts of it are: commit. Of course, that's horribly synchronous. If you've got AIO internally, making commit work is pretty easy. Doing asynchronous commit at a generic layer is not easy though unless you expose lots of details. I don't see why. Commit is a simple loop that copies all clusters. All it needs to know is if a cluster is allocated or not. When commit is running you need additional serialization against guest writes, and to direct guest writes and reads to the committed region to the backing file instead of the temporary image. But the block layer already knows of all guest writes. Generally, I think the block layer makes more sense if the interface to the formats are high level and code sharing is achieved not by mandating a world view but rather but making libraries of common functionality. This is more akin to how the FS layer works in Linux. So IMHO, we ought to add a bdrv_aio_commit function, turn the current code into a generic_aio_commit, implement a qed_aio_commit, then somehow do qcow2_aio_commit, and look at what we can refactor into common code. What Linux does if have an equivalent of bdrv_generic_aio_commit() which most implementations call (or default to), and only do something if they want something special. Something like commit (or copy-on-read, or copy-on-write, or streaming) can be implement 100% in terms of the generic functions (and indeed qcow2 backing files can be any format). -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration
On 09/12/2010 03:25 PM, Anthony Liguori wrote: On 09/12/2010 07:41 AM, Avi Kivity wrote: On 09/07/2010 05:57 PM, Anthony Liguori wrote: I agree that streaming should be generic, like block migration. The trivial generic implementation is: void bdrv_stream(BlockDriverState* bs) { for (sector = 0; sector< bdrv_getlength(bs); sector += n) { if (!bdrv_is_allocated(bs, sector,&n)) { Three problems here. First problem is that bdrv_is_allocated is synchronous. Put the whole thing in a thread. It doesn't fix anything. You don't want stream to serialize all I/O operations. Why would it serialize all I/O operations? It's just like another vcpu issuing reads. The second problem is that streaming makes the most sense when it's the smallest useful piece of work whereas bdrv_is_allocated() may return a very large range. You could cap it here but you then need to make sure that cap is at least cluster_size to avoid a lot of unnecessary I/O. That seems like a nice solution. You probably want a multiple of the cluster size to retain efficiency. What you basically do is: stream_step_three(): complete() stream_step_two(offset, length): bdrv_aio_readv(offset, length, buffer, stream_step_three) bdrv_aio_stream(): bdrv_aio_find_free_cluster(stream_step_two) Isn't there a write() missing somewhere? And that's exactly what the current code looks like. The only change to the patch that this does is make some of qed's internals be block layer interfaces. Why do you need find_free_cluster()? That's a physical offset thing. Just write to the same logical offset. IOW: bdrv_aio_stream(): bdrv_aio_read(offset, stream_2) stream_2(): if all zeros: increment offset if more: bdrv_aio_stream() bdrv_aio_write(offset, stream_3) stream_3(): bdrv_aio_write(offset, stream_4) stream_4(): increment offset if more: bdrv_aio_stream() Of course, need to serialize wrt guest writes, which adds a bit more complexity. I'll leave it to you to code the state machine for that. One of the things Stefan has mentioned is that a lot of the QED code could be reused by other formats. All formats implement things like CoW on their own today but if you exposed interfaces like bdrv_aio_find_free_cluster(), you could actually implement a lot more in the generic block layer. So, I agree with you in principle that this all should be common code. I think it's a larger effort though. Not that large I think; and it will make commit async as a side effect. The QED streaming implementation is 140 LOCs too so you quickly end up adding more code to the block formats to support these new interfaces than it takes to just implement it in the block format. bdrv_is_allocated() already exists (and is needed for commit), what else is needed? cluster size? Synchronous implementations are not reusable to implement asynchronous anything. Surely this is easy to fix, at least for qed. What we need is thread infrastructure that allows us to convert between the two methods. But you need the code to be cluster aware too. Yes, another variable in BlockDriverState. Third problem is that streaming really requires being able to do zero write detection in a meaningful way. You don't want to always do zero write detection so you need another interface to mark a specific write as a write that should be checked for zeros. You can do that in bdrv_stream(), above, before the actual write, and call bdrv_unmap() if you detect zeros. My QED branch now does that FWIW. At the moment, it only detects zero reads to unallocated clusters and writes a special zero cluster marker. However, the detection code is in the generic path so once the fsck() logic is working, we can implement a free list in QED. In QED, the detection code needs to have a lot of knowledge about cluster boundaries and the format of the device. In principle, this should be common code but it's not for the same reason copy-on-write is not common code today. Parts of it are: commit. Of course, that's horribly synchronous. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration
On 09/07/2010 05:57 PM, Anthony Liguori wrote: I agree that streaming should be generic, like block migration. The trivial generic implementation is: void bdrv_stream(BlockDriverState* bs) { for (sector = 0; sector< bdrv_getlength(bs); sector += n) { if (!bdrv_is_allocated(bs, sector,&n)) { Three problems here. First problem is that bdrv_is_allocated is synchronous. Put the whole thing in a thread. The second problem is that streaming makes the most sense when it's the smallest useful piece of work whereas bdrv_is_allocated() may return a very large range. You could cap it here but you then need to make sure that cap is at least cluster_size to avoid a lot of unnecessary I/O. That seems like a nice solution. You probably want a multiple of the cluster size to retain efficiency. The QED streaming implementation is 140 LOCs too so you quickly end up adding more code to the block formats to support these new interfaces than it takes to just implement it in the block format. bdrv_is_allocated() already exists (and is needed for commit), what else is needed? cluster size? Third problem is that streaming really requires being able to do zero write detection in a meaningful way. You don't want to always do zero write detection so you need another interface to mark a specific write as a write that should be checked for zeros. You can do that in bdrv_stream(), above, before the actual write, and call bdrv_unmap() if you detect zeros. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration
On 09/07/2010 04:41 PM, Anthony Liguori wrote: Hi, We've got copy-on-read and image streaming working in QED and before going much further, I wanted to bounce some interfaces off of the libvirt folks to make sure our final interface makes sense. Here's the basic idea: Today, you can create images based on base images that are copy on write. With QED, we also support copy on read which forces a copy from the backing image on read requests and write requests. Is copy on read QED specific? It looks very similar to the commit command, except with I/O directions reversed. IIRC, commit looks like for each sector: if image.mapped(sector): backing_image.write(sector, image.read(sector)) whereas copy-on-read looks like: def copy_on_read(): set_ioprio(idle) for each sector: if not image.mapped(sector): image.write(sector, backing_image.read(sector)) run_in_thread(copy_on_read) With appropriate locking. In additional to copy on read, we introduce a notion of streaming a block device which means that we search for an unallocated region of the leaf image and force a copy-on-read operation. The combination of copy-on-read and streaming means that you can start a guest based on slow storage (like over the network) and bring in blocks on demand while also having a deterministic mechanism to complete the transfer. The interface for copy-on-read is just an option within qemu-img create. Streaming, on the other hand, requires a bit more thought. Today, I have a monitor command that does the following: stream Which will try to stream the minimal amount of data for a single I/O operation and then return how many sectors were successfully streamed. The idea about how to drive this interface is a loop like: offset = 0; while offset < image_size: wait_for_idle_time() count = stream(device, offset) offset += count This is way too low level for the management stack. Have you considered using the idle class I/O priority to implement this? That would allow host-wide prioritization. Not sure how to do cluster-wide, I don't think NFS has the concept of I/O priority. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] KVM doesn't send an arp announce after live migrating a domain
On 08/25/2010 02:42 PM, Daniel P. Berrange wrote: Is virt-manager able to drive this? it would be great if you could drive everything from there. Yes, it does now, under the menu Edit -> Host Details -> Network Interfaces NetworkManager has also finally learnt to ignore ifcfg-XXX files which have a BRIDGE= setting in them, so it shouldn't totally trash your guest bridge networking if you leave NM running. Cool. I guess what remains is to get people to unlearn all the previous hacks. (also would be nice to have libvirt talk to NetworkManager instead of /etc/sysconfig) -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] KVM doesn't send an arp announce after live migrating a domain
On 08/25/2010 02:36 PM, Daniel P. Berrange wrote: Can't libvirt also create a non-NAT bridge? Looks like it would prevent a lot of manual work and opportunity for misconfiguration. Yes, it can on latest Fedora/RHEL6, using the netcf library. This is the new 'virsh iface-XXX' command set (and equivalent APIs). I've not updated the docs to cover this functionality yet though. It also does bonding, and vlans, etc Great. Is virt-manager able to drive this? it would be great if you could drive everything from there. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] KVM doesn't send an arp announce after live migrating a domain
On 08/25/2010 02:15 PM, Daniel P. Berrange wrote: So it looks like the default config uses the kernel default? If libvirt uses an existing bridge I agree it shouldn't hack it, but if it creates its own can't it use a sensible default? That is the NAT virtual network. That one *does* default to a forward delay of 0, but since it is NAT, it is fairly useless for migration in anycase. If you do 'virsh net-dumpxml default' you should see that delay='0' was added The OP was using bridging rather than NAT though, so this XML example doesn't apply. My comments about libvirt not overriding kenrel policy for forward delay were WRT full bridging mode, not the NAT mode[1] Yes, of course. Can't libvirt also create a non-NAT bridge? Looks like it would prevent a lot of manual work and opportunity for misconfiguration. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] KVM doesn't send an arp announce after live migrating a domain
On 08/25/2010 01:52 PM, Daniel P. Berrange wrote: I think libvirt is doing something about this, copying list for further info. libvirt doesn't set a policy for this. It provides an API for configuring host networking, but we don't override the kernel's forward delay policy, since we don't presume that all bridges are going to have VMs attached. In any case the API isn't available for Debian yet, since no one has ported netcf to Debian, so I assume the OP set bridging up manually. The '15' second default is actually a kernel level default IIRC. The two main host network configs recommended for use with libvirt+KVM (either NAT or bridging) are documented here: http://wiki.libvirt.org/page/Networking From that page: # virsh net-define /usr/share/libvirt/networks/default.xml From my copy of that file: default So it looks like the default config uses the kernel default? If libvirt uses an existing bridge I agree it shouldn't hack it, but if it creates its own can't it use a sensible default? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] KVM doesn't send an arp announce after live migrating a domain
On 08/25/2010 12:21 PM, Nils Cant wrote: On 08/25/2010 10:38 AM, Gleb Natapov wrote: qemu sends gratuitous ARP after migration. Check forward delay setting on your bridge interface. It should be set to zero. Aha! That fixed it. Turns out that debian bridge-utils sets the default to 15 for bridges. Manually setting it to 0 with 'brctl setfd br0 0' or setting the 'bridge_fd' parameter to 0 in /etc/network/interfaces solves the issue. I think libvirt is doing something about this, copying list for further info. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: [PATCH] Introduce a -libvirt-caps flag as a stop-gap
On 07/27/2010 07:38 PM, Anthony Liguori wrote: I'm going to revert the -help changes for 0.13 so that old versions of libvirt work but not for master. What is the goal here? Make qemu.git explicitly be unusable via libvirt? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Libvirt debug API
On 04/26/2010 05:48 PM, Anthony Liguori wrote: We could easily reuse that. Any other security context code would be custom written; so it can be written as a qemud plugin instead of a bit of code that goes before a qemu launch. I think we're mostly in agreement with respect to the need to have more control over the security context the qemu runs in. Whether it's launched via a daemon or directly I think is an implementation detail that we can debate when we get closer to an actual implementation. Good, as I haven't decided yet which side I'm on. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Libvirt debug API
On 04/26/2010 05:28 PM, Anthony Liguori wrote: Or a library that the user-written launcher calls. Or a plugin that qemud calls. A plugin would lose the security context. It could attempt to recreate it that seems like a lot of unnecessary complexity. A plugin would create the security context instead of the launcher. Currently security contexts are created by the login process. We could easily reuse that. Any other security context code would be custom written; so it can be written as a qemud plugin instead of a bit of code that goes before a qemu launch. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Libvirt debug API
On 04/26/2010 05:25 PM, Chris Lalancette wrote: Right, and you are probably one of the users this work targets. But in general, for those not very familiar with virtualization/qemu, we want to steer them far clear of this API. That goes doubly true for application developers; we want them to be able to use a stable, long-term API and not have to worry about the nitty-gritty details of the monitor. It's that latter group that we want to make sure doesn't use this API. With qmp, we have a stable long term API, and the nitty-gritty details are easily hidden behind a stock json parser (unfortunately some rpc details remain). The command line is baroque, but the libvirt xml isn't so pretty either. The problem is a user that starts with libvirt and outgrows its featureset. Do we want them to fall back to qmp? -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Libvirt debug API
On 04/26/2010 05:19 PM, Anthony Liguori wrote: On 04/26/2010 09:01 AM, Avi Kivity wrote: On 04/26/2010 04:43 PM, Anthony Liguori wrote: The reason I lean toward the direct launch model is that it gives the user a lot of flexibility in terms of using things like namespaces, DAC, cgroups, capabilities, etc. A lot of potential features are lost when you do indirect launch because you have to teach the daemon how to support each of these features. But what's the alternative? Teach the user how to do all these things? You can expose layers of API. The lowest layer makes no changes to the security context. A higher (optional) layer could do dynamic labelling. Or a library that the user-written launcher calls. Or a plugin that qemud calls. It's infinitely flexible, but it's not an API you can give to a management tool developer. I think the goal of a management API should be to make common things very simple to do but not preclude doing even the most advanced things. Agreed. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Libvirt debug API
On 04/26/2010 04:43 PM, Anthony Liguori wrote: The reason I lean toward the direct launch model is that it gives the user a lot of flexibility in terms of using things like namespaces, DAC, cgroups, capabilities, etc. A lot of potential features are lost when you do indirect launch because you have to teach the daemon how to support each of these features. But what's the alternative? Teach the user how to do all these things? It's infinitely flexible, but it's not an API you can give to a management tool developer. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Libvirt debug API
On 04/26/2010 04:46 PM, Anthony Liguori wrote: (3) The system management application can certainly create whatever context it wants to launch a vm from. It's comes down to who's responsible for creating the context the guest runs under. I think doing that at the libvirt level takes away a ton of flexibility from the management application. If you want to push the flexibility slider all the way to the right you get bare qemu. It exposes 100% of qemu capabilities. And it's not so bad these days. But it's not something that can be remoted. As I mentioned earlier, remoting is not a very important use-case to me. Does RHEV-M actually use the remote libvirt interface? I assume it'll talk to vdsm via some protocol and vdsm will use the local libvirt API. Yes. I suspect most uses of libvirt are actually local uses. I expect the same, though I'm sure a design goal was to make use of libvirt be reasonable through the remote API. If we aren't able to fulfil it, much of the value of libvirt goes away. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Libvirt debug API
On 04/26/2010 04:14 PM, Anthony Liguori wrote: IOW, libvirt does not run guests as separate users which is why it needs to deal with security in the first place. What if one user has multiple guests? isolation is still needed. Don't confuse a management application's concept of users with using separate uid's to launch guests. Then someone needs to manage those users. A user can't suid to any random user. You need someone privileged to allocate the new uid and su into it. One user per guest does not satisfy some security requirements. The 'M' in selinux stands for mandatory, which means that the entities secured can't leak information even if they want to (scenario: G1 breaks into qemu, chmods files, G2 breaks into qemu, reads files). If you're implementing a chinese firewall policy, then yes, you want to run each guest as a separate selinux context. Starting as separate users and setting DAC privileges appropriately will achieve this. But you're not always implementing that type of policy. If the guest inherits the uid, selinux context, and namespaces of whatever launches the guest, then you have the most flexibility from a security perspective. How do you launch a libvirt guest in a network namespace? How do you put it in a chroot? You pass the namespace fd and chroot fd using SCM_RIGHTS (except you probably can't do that). Today, you have to make changes to libvirt whereas in a direct launch model, you get all of the neat security features linux supports for free. But you lose tap networking, unless you have a privileged helper. And how is the privileged helper to authenticate the qemu calling it? And I've said in the past that I don't like the idea of a qemud :-) I must have missed it. Why not? Every other hypervisor has a central management entity. Because you end up launching all guests from a single security context. Run multiple qemuds? But what you say makes sense. It's similar to the fork() /* do interesting stuff */ exec() model, compared to the spawn(..., hardcoded list of interesting stuff). Yeah, that's where I'm at. I'd eventually like libvirt to use our provided API and I can see where it would add value to the stack (by doing things like storage and network management). We do provide an API, qmp, and libvirt uses it? Yeah, but we need to support more features (like guest enumeration). What are our options? 1) qemud launches, enumerates 2) user launches, qemu registers in qemud 3) user launches, qemu registers in filesystem 4) you launched it, you enumerate it That's wrong for three reasons. First, selinux is not a uid replacement (if it was libvirt could just suid $random_user before launching qemu). Second, a single user's guests should be protected from each other. Third, in many deployments, the guest's owner isn't logged in to supply the credentials, it's system management that launches the guests. (1) uid's are just one part of an applications security context. There's an selinux context, all of the various namespaces, capabilities, etc. If you use a daemon to launch a guest, you lose all of that unless you have a very sophisticated api. True. In a perfect world, we'd use SCM_RIGHTS to channel all of these to libvirt or qemud. On the other hand, users don't want to do all these things by hand. They want management to do things for them. Self launch is very flexible, but it's not an API, and cannot be used remotely. We could use qemud plugins to allow the user to customize the launch process. (2) If you want to implement a policy that only a single guest can access a single image, you can create an SELinux policy and use static labelling to achieve that. That's just one type of policy though. It's also not going to work in an environment that doesn't preserve all security labels (like direct access to volumes; /dev is on tmpfs these days). (3) The system management application can certainly create whatever context it wants to launch a vm from. It's comes down to who's responsible for creating the context the guest runs under. I think doing that at the libvirt level takes away a ton of flexibility from the management application. If you want to push the flexibility slider all the way to the right you get bare qemu. It exposes 100% of qemu capabilities. And it's not so bad these days. But it's not something that can be remoted. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Libvirt debug API
On 04/26/2010 04:53 AM, Anthony Liguori wrote: On 04/25/2010 06:51 AM, Avi Kivity wrote: It depends on what things you think are important. A lot of libvirt's complexity is based on the fact that it uses a daemon and needs to deal with the security implications of that. You don't need explicit labelling if you don't use a daemon. I don't follow. If you have multiple guests that you want off each other's turf you have to label their resources, either statically or dynamically. How is it related to a daemon being present? Because libvirt has to perform this labelling because it loses the original user's security context. If you invoke qemu with the original user's credentials that launched the guest, then you don't need to do anything special with respect to security. IOW, libvirt does not run guests as separate users which is why it needs to deal with security in the first place. What if one user has multiple guests? isolation is still needed. One user per guest does not satisfy some security requirements. The 'M' in selinux stands for mandatory, which means that the entities secured can't leak information even if they want to (scenario: G1 breaks into qemu, chmods files, G2 breaks into qemu, reads files). This is really the qemu model (as opposed to the xend model). (and the qemud model). And I've said in the past that I don't like the idea of a qemud :-) I must have missed it. Why not? Every other hypervisor has a central management entity. In theory, it does support this with the session urls but they are currently second-class citizens in libvirt. The remote dispatch also adds a fair bit of complexity and at least for the use-cases I'm interested in, it's not an important feature. If libvirt needs a local wrapper for interesting use cases, then it has failed. You can't have a local wrapper with the esx driver, for example. This is off-topic, but can you detail why you don't want remote dispatch (I assume we're talking about a multiple node deployment). Because there are dozens of remote management APIs and then all have a concept of agents that run on the end nodes. When fitting virtualization management into an existing management infrastructure, you are going to always use a local API. When you manage esx, do you deploy an agent? I thought it was all done via their remote APIs. Every typical virtualization use will eventually grow some non-typical requirements. If libvirt explicitly refuses to support qemu features, I don't see how we can recommend it - even if it satisfies a user's requirements today, what about tomorrow? what about future qemu feature, will they be exposed or not? If that is the case then we should develop qemud (which libvirt and other apps can use). (even if it isn't the case I think qemud is a good idea) Yeah, that's where I'm at. I'd eventually like libvirt to use our provided API and I can see where it would add value to the stack (by doing things like storage and network management). We do provide an API, qmp, and libvirt uses it? That's not what the libvirt community wants to do. We're very bias. We've made decisions about how features should be exposed and what features should be included. We want all of those features exposed exactly how we've implemented them because we think it's the right way. I'm not sure there's an obvious way forward unless we decide that there is going to be two ways to interact with qemu. One way is through the libvirt world-view and the other is through a more qemu centric view. The problem then becomes allowing those two models to co-exist happily together. I don't think there's a point in managing qemu through libvirt and directly in parallel. It means a user has to learn both APIs, and for every operation they need to check both to see what's the best way of exploiting the feature. There will invariably be some friction. Layers need to stack on top of each other, not live side by side or bypass each other. I agree with you theoretically but practically, I think it's immensely useful as a stop-gap. Sure. But please lets not start being clever with transactions and atomic operations and stuff, it has to come with a label that says, if you're using this, then something is wrong. The alternative is to get libvirt to just act as a thin layer to expose qemu features directly. But honestly, what's the point of libvirt if they did that? For most hypervisors, that's exactly what libvirt does. For Xen, it also bypasses Xend and the hypervisor's API, but it shouldn't really. Historically, xend was so incredibly slow (especially for frequent statistics collection) that it was a necessity. Ah, reimplement rathe
Re: [libvirt] [Qemu-devel] Re: Libvirt debug API
On 04/23/2010 09:33 PM, Anthony Liguori wrote: This is a different ambiguity, about the semantic results of the commands, where as I'm refering to the execution order. If I look at a libvirt log file and see a set of JSON commands logged, I want to know that this ordering from the logs, was indeed the same as order in which qemu processed them. If you have two separate monitor connection you can't be sure of the order of execution. It is key for our bug troubleshooting that given a libvirt log file, we can replay the JSON commands again and get the same results. Two monitor connections is just increasing complexity of code without any tangible benefit. I think you're assuming direct access to the second monitor? I'm not suggesting that. I'm suggesting that libvirt is still the one submitting commands to the second monitor and that it submits those commands in lock step. What about protocol extensions? For instance, pretend libvirt doesn't support async messages, what would it do when it receives one from the user's monitor? -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Libvirt debug API
On 04/25/2010 06:39 AM, Anthony Liguori wrote: On 04/24/2010 04:46 AM, Avi Kivity wrote: On 04/23/2010 09:29 PM, Anthony Liguori wrote: Maybe. We'll still have issues. For example, sVirt: if a QMP command names a labeled resource, the non-libvirt user will have no way of knowing how to label it. This is orthogonal to QMP and has to do strictly with how libvirt prepares a resource for qemu. It's not orthogonal. If you allow qmp access behind libvirt's back, it's a problem that you will have. My point was, if libvirt is just exposing raw qemu features, then it should be possible for qemu to arbitrate concurrent access. If libvirt implements features on top of qemu, then no other third party will be able to co-exist with those features without interacting with qemu. It's an impossible problem for qemu to solve (arbitrating access to state stored in a third party management app). If libvirt implement features (like sVirt or network configuration) then it is indeed impossible for qemu to arbitrate. If we take all those features into qemu[d], then it becomes possible to arbitrate so long as the libvirt and the other management app don't step on each others toes. But that's impossible to guarantee if you upgrade your libvirt while keeping the other app unchanged. 1) Allow libvirt users to access features of qemu that are not exposed through libvirt That's an artificial problem. If libvirt exposes all features, you don't need to solve it. It won't. Otherwise, we wouldn't be having this discussion. Then libvirt will fade into uselessness. A successful app using libvirt will grow, and will have new requirements. As soon as libvirt doesn't meet those new requirements, the app will need to talk to qemu[d] directly. Once it does that, it may as well use qemu[d] for everything; if you can talk QMP and generate qemu command lines, there's not much that libvirt buys you. Even the cross-hypervisor support is not that hard to implement, especially if you only need to satisfy your own requirements. 2) Provide a means for non-libvirt users to interact with qemu We have qmp. It doesn't do multiple guest management. I think it's reasonable to have a qemud which does (and also does sVirt and the zillion other things libvirt does) provided we remove them from libvirt (long term). The only problem is that it's a lot of effort. It depends on what things you think are important. A lot of libvirt's complexity is based on the fact that it uses a daemon and needs to deal with the security implications of that. You don't need explicit labelling if you don't use a daemon. I don't follow. If you have multiple guests that you want off each other's turf you have to label their resources, either statically or dynamically. How is it related to a daemon being present? This is really the qemu model (as opposed to the xend model). (and the qemud model). In theory, it does support this with the session urls but they are currently second-class citizens in libvirt. The remote dispatch also adds a fair bit of complexity and at least for the use-cases I'm interested in, it's not an important feature. If libvirt needs a local wrapper for interesting use cases, then it has failed. You can't have a local wrapper with the esx driver, for example. This is off-topic, but can you detail why you don't want remote dispatch (I assume we're talking about a multiple node deployment). 3) Provide a unified and interoperable view of the world for non-libvirt and libvirt users This problem can be solved by the non-libvirt users adopting libvirt, or the libvirt users dropping libvirt. I don't understand why we need to add interoperability between users who choose an interoperability library and users who don't choose an interoperability library. What I'd like to avoid is user confusion. Should a user use libvirt or libqemu? If they make a decision to use libqemu and then down the road want to use libvirt, how hard is it to switch? Fragmentation hurts the ecosystem and discourages good applications from existing. I think it's our responsibility to ensure there's a good management API that exists for qemu that we can actively recommend to our users. libvirt is very good at typical virtualization uses of qemu but qemu is much more than just that and has lots of advanced features. Every typical virtualization use will eventually grow some non-typical requirements. If libvirt explicitly refuses to support qemu features, I don't see how we can recommend it - even if it satisfies a user's requirements today, what about tomorrow? what about future qemu feature, will they be exposed or not? If that is the case then we should develop qemud (which libvirt and other apps ca
Re: [libvirt] [Qemu-devel] Re: Libvirt debug API
On 04/23/2010 09:29 PM, Anthony Liguori wrote: Maybe. We'll still have issues. For example, sVirt: if a QMP command names a labeled resource, the non-libvirt user will have no way of knowing how to label it. This is orthogonal to QMP and has to do strictly with how libvirt prepares a resource for qemu. It's not orthogonal. If you allow qmp access behind libvirt's back, it's a problem that you will have. Much better to exact a commitment from libvirt to track all QMP (and command line) capabilities. Instead of adding cleverness to QMP, add APIs to libvirt. Let's step back for a minute because I think we're missing the forest through the trees. We're trying to address a few distinct problems: 1) Allow libvirt users to access features of qemu that are not exposed through libvirt That's an artificial problem. If libvirt exposes all features, you don't need to solve it. 2) Provide a means for non-libvirt users to interact with qemu We have qmp. It doesn't do multiple guest management. I think it's reasonable to have a qemud which does (and also does sVirt and the zillion other things libvirt does) provided we remove them from libvirt (long term). The only problem is that it's a lot of effort. 3) Provide a unified and interoperable view of the world for non-libvirt and libvirt users This problem can be solved by the non-libvirt users adopting libvirt, or the libvirt users dropping libvirt. I don't understand why we need to add interoperability between users who choose an interoperability library and users who don't choose an interoperability library. For (1), we all agree that the best case scenario would be for libvirt to support every qemu feature. I think we can also all agree though that this is not really practical and certainly not practical for developers since there is a development cost associated with libvirt support (to model an API appropriately). All except me, perhaps. We already have two layers of feature modeling: first, we mostly emulate real life, not invent new features. PCI hotplug existed long before qemu had support for it. Second, we do give some thought into how we expose it through QMP. libvirt doesn't have to invent it again, it only has to expose it through its lovely xml and C APIs. The new API proposed addresses (1) by allowing a user to drill down to the QMP context. It's a good solution IMHO and I think we all agree that there's an inherent risk to this that users will have to evaluate on a case-by-case basis. It's a good stop-gap though. Agree. (2) is largely addressed by QMP and a config file. I'd like to see a nice C library, but I think a lot of other folks are happy with JSON support in higher level languages. I agree with them. C is a pretty bad choice for managing qemu (or even, C is a pretty bad choice). (3) is the place where there are still potential challenges. I think at the very least, our goal should be to enable conversion from (2) and (1) to be as easy as possible. That's why I have proposed implementing a C library for the JSON transport because we could plumb that through the new libvirt API. This would allow a user to very quickly port an application from QMP to libvirt. In order to do this, we need the libvirt API to expose a dedicated monitor because we'll need to be able to manipulate events and negotiate features. Most likely any application that talks QMP will hide the protocol behind a function call interface anyway. Beyond simple porting, there's a secondary question of having non-libvirt apps co-exist with libvirt apps. I think it's a good long term goal, but I don't think we should worry too much about it now. libvirt needs to either support all but the most esoteric use cases, or to get out of the way completely. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Libvirt debug API
On 04/23/2010 04:48 PM, Anthony Liguori wrote: On 04/23/2010 07:48 AM, Avi Kivity wrote: On 04/22/2010 09:49 PM, Anthony Liguori wrote: real API. Say, adding a device libvirt doesn't know about or stopping the VM while libvirt thinks it's still running or anything like that. Another problem is issuing Monitor commands that could confuse libvirt's We need to make libvirt and qemu smarter. We already face this problem today with multiple libvirt users. This is why sophisticated management mechanisms (like LDAP) have mechanisms to do transactions or at least a series of atomic operations. And people said qmp/json was overengineered... But seriously, transactions won't help anything. qemu maintains state, and when you have two updaters touching a shared variable not excepting each other to, things break, no matter how much locking there is. Let's consider some concrete examples. I'm using libvirt and QMP and in QMP, I want to hot unplug a device. Today, I do this by listing the pci devices, and issuing a pci_del that takes a PCI address. This is intrinsically racy though because in the worst case scenario, in between when I enumerate pci devices and do the pci_del in QMP, in libvirt, I've done a pci_del and then a pci_add within libvirt of a completely different device. Obviously you should do the pci_del through libvirt. Once libvirt supports an API, use it. There are a few ways to solve this, the simplest being that we give devices unique ids that are never reused and instead of pci_del taking a pci bus address, it takes a device id. That would address this race. You can get very far by just being clever about unique ids and notifications. There are some cases where a true RMW may be required but I can't really think of one off hand. The way LDAP addresses this is that it has a batched operation and a simple set of boolean comparison operations. This lets you execute a batched operation that will do a RMW. I'm sure we can be very clever, but I'd rather direct this cleverness to qemu core issues, not to the QMP (which in turn requires that users be clever to use it correctly). QMP is a low bandwidth protocol, so races will never show up in testing. We're laying mines here for users to step on that we will never encounter ourselves. The only way that separate monitors could work is if they touch completely separate state, which is difficult to ensure if you upgrade your libvirt. I don't think this is as difficult of a problem as you think it is. If you look at Active Directory and the whole set of management tools based on it, they certainly allow concurrent management applications. You can certainly get into trouble still but with just some careful considerations, you can make two management applications work together 90% of the time without much fuss on the applications part. Maybe. We'll still have issues. For example, sVirt: if a QMP command names a labeled resource, the non-libvirt user will have no way of knowing how to label it. Much better to exact a commitment from libvirt to track all QMP (and command line) capabilities. Instead of adding cleverness to QMP, add APIs to libvirt. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Libvirt debug API
On 04/22/2010 09:49 PM, Anthony Liguori wrote: real API. Say, adding a device libvirt doesn't know about or stopping the VM while libvirt thinks it's still running or anything like that. Another problem is issuing Monitor commands that could confuse libvirt's We need to make libvirt and qemu smarter. We already face this problem today with multiple libvirt users. This is why sophisticated management mechanisms (like LDAP) have mechanisms to do transactions or at least a series of atomic operations. And people said qmp/json was overengineered... But seriously, transactions won't help anything. qemu maintains state, and when you have two updaters touching a shared variable not excepting each other to, things break, no matter how much locking there is. The only way that separate monitors could work is if they touch completely separate state, which is difficult to ensure if you upgrade your libvirt. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/25/2010 10:18 AM, Alexander Graf wrote: libqemu.so would be a C API. C is not the first choice for writing GUIs or management applications. So it would need to be further wrapped. We also need to allow qemu to control the display directly, without going through vnc. For the current functionality I tend to disagree. All that we need is an shm vnc extension that allows the GUI and qemu to not send image data over the wire, but only the dirtyness information. It still means an extra copy. I don't think we want to share the guest framebuffer (it includes offscreen bitmaps), so we'll need to copy it somewhere else. It's even worse with qxl/spice where there is no framebuffer. As soon as we get to 3D things might start to look different. Very different. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/26/2010 10:37 AM, Markus Armbruster wrote: The importances of libqemu is: 1) Providing a common QMP transport implementation that is extensible by third parties 2) Providing a set of common transports that support automatic discovery of command line launched guests 3) Providing a generic QMP dispatch function Adding to this C wrappers for QMP commands threatens to make QMP command arguments part of the library ABI. Compatible QMP evolution (like adding an optional argument) turns into a libqmp soname bump. Counter-productive. How do you plan to avoid that? You could make the API use QObjects; then you're completely isolated from high level protocol changes. Of course, this is less useful than the full API. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/25/2010 03:57 PM, Anthony Liguori wrote: On 03/25/2010 08:48 AM, Avi Kivity wrote: But an awful lot of the providers for pegasus are written in C. But we're concerned with only one, the virt provider. None of the others will use libqemu? The point is, C is a lowest common denominator and it's important to support in a proper way. Problem is, it means horrible support for everyone else. Why? We can provide a generic QMP dispatch interface that high level languages can use. Then they can do fancy dispatch, treat QErrors as exceptions, etc. Sure, with high level wrappers everything's fine. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/25/2010 03:44 PM, Anthony Liguori wrote: On 03/25/2010 07:37 AM, Avi Kivity wrote: On 03/25/2010 02:33 PM, Anthony Liguori wrote: From my point of view, i wouldn't want to write a high level management toolstack in C, specially since the API is well defined JSON which is easily available in all high level language out there. There's a whole world of C based management toolstacks (CIM). Gratefully I know very little about CIM, but isn't it language independent? The prominent open source implementation, pegasus, is written in C++. There is also SFCB which is written in C. Ok. But an awful lot of the providers for pegasus are written in C. But we're concerned with only one, the virt provider. None of the others will use libqemu? The point is, C is a lowest common denominator and it's important to support in a proper way. Problem is, it means horrible support for everyone else. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/25/2010 02:33 PM, Anthony Liguori wrote: From my point of view, i wouldn't want to write a high level management toolstack in C, specially since the API is well defined JSON which is easily available in all high level language out there. There's a whole world of C based management toolstacks (CIM). Gratefully I know very little about CIM, but isn't it language independent? The prominent open source implementation, pegasus, is written in C++. Or are you referring to specific management apps written in C? If they go through CIM, how can they talk qmp? -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/25/2010 10:26 AM, Vincent Hanquez wrote: On 24/03/10 21:40, Anthony Liguori wrote: If so, what C clients you expected beyond libvirt? Users want a C API. I don't agree that libvirt is the only C interface consumer out there. (I've seen this written too many times ...) How do you know that ? did you do a poll or something where *actual* users vote/tell ? From my point of view, i wouldn't want to write a high level management toolstack in C, specially since the API is well defined JSON which is easily available in all high level language out there. Strongly agreed. Even the managementy bits of qemu (anything around QObject) are suffering from the lowleveledness of C. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/24/2010 10:32 PM, Anthony Liguori wrote: So far, a libqemu.so with a flexible transport that could be used directly by a libvirt user (ala cairo/gdk type interactions) seems like the best solution to me. libqemu.so would be a C API. C is not the first choice for writing GUIs or management applications. So it would need to be further wrapped. We also need to allow qemu to control the display directly, without going through vnc. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/24/2010 06:42 PM, Luiz Capitulino wrote: On Wed, 24 Mar 2010 12:42:16 +0200 Avi Kivity wrote: So, at best qemud is a toy for people who are annoyed by libvirt. Is the reason for doing this in qemu because libvirt is annoying? Mostly. I don't see how adding yet another layer/daemon is going to improve ours and user's life (the same applies for libqemu). libvirt becomes optional. If I got it right, there were two complaints from the kvm-devel flamewar: 1. Qemu has usability problems 2. There's no way an external tool can get /proc/kallsyms info from Qemu I don't see how libqemu can help with 1) and having qemud doesn't seem the best solution for 2) either. Still talking about 2), what's wrong in getting the PID or having a QMP connection in a well known location as suggested by Anthony? I now believe that's the best option. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/24/2010 02:32 PM, Anthony Liguori wrote: You don't get a directory filled with a zillion socket files pointing at dead guests. Agree that's a poor return on investment. Deleting it on atexit combined with flushing the whole directory at startup is a pretty reasonable solution to this (which is ultimately how the entirety of /var/run behaves). If you're really paranoid, you can fork() a helper with a shared pipe to implement unlink on close. My paranoia comes nowhere near to my dislike of forked helpers. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/24/2010 02:30 PM, Paul Brook wrote: On 03/23/2010 09:24 PM, Anthony Liguori wrote: We also provide an API for guest creation (the qemu command line). As an aside, I'd like to see all command line options have qmp equivalents (most of them can be implemented with a 'set' command that writes qdev values). This allows a uniform way to control a guest, whether at startup or runtime. You start with a case, cold-plug a motherboard, cpus, memory, disk controllers, and power it on. The main blocker to this is converting all the devices to qdev. "partial" conversions are not sufficient. It's approximately the same problem as a machine config file. If you have one then the other should be fairly trivial. Agreed. IMO the no_user flag is a bug, and should not exist. Sorry, what's that? -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/24/2010 02:30 PM, Anthony Liguori wrote: On 03/24/2010 07:27 AM, Avi Kivity wrote: On 03/24/2010 02:19 PM, Anthony Liguori wrote: qemud - daemonaizes itself - listens on /var/lib/qemud/guests for incoming guest connections - listens on /var/lib/qemud/clients for incoming client connections - filters access according to uid (SCM_CREDENTIALS) - can pass a new monitor to client (SCM_RIGHTS) - supports 'list' command to query running guests - async messages on guest startup/exit Then guests run with the wrong security context. Why? They run with the security context of whoever launched them (could be libvirtd). Because it doesn't have the same security context as qemud and since clients have to connect to qemud, qemud has to implement access control. Yeah. It's far better to have the qemu instance advertise itself such that and client connects directly to it. Then all of the various authorization models will be applied correctly to it. Agreed. qemud->exit(). -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/24/2010 02:23 PM, Anthony Liguori wrote: On 03/24/2010 05:42 AM, Avi Kivity wrote: The filtering access part of this daemon is also not mapping well onto libvirt's access model, because we don't soley filter based on UID in libvirtd. We have it configurable based on UID, policykit, SASL, TLS/x509 already, and intend adding role based access control to further filter things, integrating with the existing apparmour/selinux security models. A qemud that filters based on UID only, gives users a side-channel to get around libvirt's access control. That's true. Any time you write a multiplexer these issues crop up. Much better to stay in single process land where everything is already taken care of. What does a multiplexer give you that making individual qemu instances discoverable doesn't give you? The later doesn't suffer from these problems. You don't get a directory filled with a zillion socket files pointing at dead guests. Agree that's a poor return on investment. Maybe we want a O_UNLINK_ON_CLOSE for unix domain sockets - but no, that's not implementable. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/24/2010 02:19 PM, Anthony Liguori wrote: qemud - daemonaizes itself - listens on /var/lib/qemud/guests for incoming guest connections - listens on /var/lib/qemud/clients for incoming client connections - filters access according to uid (SCM_CREDENTIALS) - can pass a new monitor to client (SCM_RIGHTS) - supports 'list' command to query running guests - async messages on guest startup/exit Then guests run with the wrong security context. Why? They run with the security context of whoever launched them (could be libvirtd). -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/24/2010 12:36 PM, Daniel P. Berrange wrote: On Wed, Mar 24, 2010 at 07:17:26AM +0200, Avi Kivity wrote: On 03/23/2010 08:00 PM, Avi Kivity wrote: On 03/23/2010 06:06 PM, Anthony Liguori wrote: I thought the monitor protocol *was* our API. If not, why not? It is. But our API is missing key components like guest enumeration. So the fundamental topic here is, do we introduce these missing components to allow people to build directly to our interface or do we make use of the functionality that libvirt already provides if they can plumb our API directly to users. Guest enumeration is another API. Over the kvm call I suggested a qemu concentrator that would keep track of all running qemus, and would hand out monitor connections to users. It can do the enumeration (likely using qmp). Libvirt could talk to that, like it does with other hypervisors. To elaborate qemud - daemonaizes itself - listens on /var/lib/qemud/guests for incoming guest connections - listens on /var/lib/qemud/clients for incoming client connections - filters access according to uid (SCM_CREDENTIALS) - can pass a new monitor to client (SCM_RIGHTS) - supports 'list' command to query running guests - async messages on guest startup/exit My concern is that once you provide this, then next someone wants it to list inactive guests too. That's impossible, since qemud doesn't manage config files or disk images. It can't even launch guests! Once you list inactive guests, then you'll want this to start a guest. Once you start guests then you want cgroups integration, selinux labelling& so on, until it ends up replicating all of libvirt's QEMU functionality. To be able to use the list functionality from libvirt, we need this daemon to also guarentee id, name& uuid uniqueness for all VMs, both running and inactive, with separate namespaces for the system vs per-user lists. Or we have to ignore any instances listed by qemud that were not started by libvirt, which rather defeats the purpose. qemud won't guarantee name uniqueness or provide uuids. The filtering access part of this daemon is also not mapping well onto libvirt's access model, because we don't soley filter based on UID in libvirtd. We have it configurable based on UID, policykit, SASL, TLS/x509 already, and intend adding role based access control to further filter things, integrating with the existing apparmour/selinux security models. A qemud that filters based on UID only, gives users a side-channel to get around libvirt's access control. That's true. Any time you write a multiplexer these issues crop up. Much better to stay in single process land where everything is already taken care of. So, at best qemud is a toy for people who are annoyed by libvirt. -- error compiling committee.c: too many arguments to function -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/23/2010 09:24 PM, Anthony Liguori wrote: We also provide an API for guest creation (the qemu command line). As an aside, I'd like to see all command line options have qmp equivalents (most of them can be implemented with a 'set' command that writes qdev values). This allows a uniform way to control a guest, whether at startup or runtime. You start with a case, cold-plug a motherboard, cpus, memory, disk controllers, and power it on. I would also like a way to read the entire qdev tree from qmp. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/23/2010 08:00 PM, Avi Kivity wrote: On 03/23/2010 06:06 PM, Anthony Liguori wrote: I thought the monitor protocol *was* our API. If not, why not? It is. But our API is missing key components like guest enumeration. So the fundamental topic here is, do we introduce these missing components to allow people to build directly to our interface or do we make use of the functionality that libvirt already provides if they can plumb our API directly to users. Guest enumeration is another API. Over the kvm call I suggested a qemu concentrator that would keep track of all running qemus, and would hand out monitor connections to users. It can do the enumeration (likely using qmp). Libvirt could talk to that, like it does with other hypervisors. To elaborate qemud - daemonaizes itself - listens on /var/lib/qemud/guests for incoming guest connections - listens on /var/lib/qemud/clients for incoming client connections - filters access according to uid (SCM_CREDENTIALS) - can pass a new monitor to client (SCM_RIGHTS) - supports 'list' command to query running guests - async messages on guest startup/exit qemu - with -qemud option, connects to qemud (or maybe automatically?) qemudc - command-line client, can access qemu human monitor -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Supporting hypervisor specific APIs in libvirt
On 03/23/2010 09:31 PM, Anthony Liguori wrote: One problem is that this is libvirt version specific. For example, libvirt x doesn't support spice so we control that thorough qmp. But libvirt x+1 does support spice and now it gets confused about all the spice messages. That's only a problem if we only support a single QMP session. This is exactly why we need to support multiple QMP sessions (and do). It's unrelated to the number of sessions. libvirt expects state that it manages in qemu not to change randomly. Users know that, so they will only manage non-libvirt state in their private session. But a new version of libvirt may expand its scope and start managing this area, leading to conflicts. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/23/2010 08:23 PM, Daniel P. Berrange wrote: On Tue, Mar 23, 2010 at 08:00:21PM +0200, Avi Kivity wrote: On 03/23/2010 06:06 PM, Anthony Liguori wrote: I thought the monitor protocol *was* our API. If not, why not? It is. But our API is missing key components like guest enumeration. So the fundamental topic here is, do we introduce these missing components to allow people to build directly to our interface or do we make use of the functionality that libvirt already provides if they can plumb our API directly to users. Guest enumeration is another API. Over the kvm call I suggested a qemu concentrator that would keep track of all running qemus, and would hand out monitor connections to users. It can do the enumeration (likely using qmp). Libvirt could talk to that, like it does with other hypervisors. The libvirt QEMU driver started out as a fairly simple "concentrator" not doing much beyond spawning QEMU with argv& issuing monitor commands. The host concentrator inevitably needs to be involved in the OS level integration with features such as cgroups, selinux/apparmounr, host NIC management, storage, iptables, etc. If you look at the daemons for Xen, VirtualBox, VMWare, that other libvirt drivers talk to, they all do far more than just enumeration of VMs. A QEMU concentrator may start out simple, but it will end up growing over time to re-implememt much, if not all, the stuff that libvirt already provides for QEMU in terms of host level APIs. The idea is not to replace libvirt, but provide something that sits underneath. It wouldn't do any non-qemu host-level APIs. If the core problem here is to provide app developers access to the full range of QEMU functionality, the re-implementing the entire of the libvirt QEMU driver is rather over the top way to achieve that. It's trivial to expose all qemu functionality by exposing a qmp connection. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt
On 03/23/2010 06:06 PM, Anthony Liguori wrote: I thought the monitor protocol *was* our API. If not, why not? It is. But our API is missing key components like guest enumeration. So the fundamental topic here is, do we introduce these missing components to allow people to build directly to our interface or do we make use of the functionality that libvirt already provides if they can plumb our API directly to users. Guest enumeration is another API. Over the kvm call I suggested a qemu concentrator that would keep track of all running qemus, and would hand out monitor connections to users. It can do the enumeration (likely using qmp). Libvirt could talk to that, like it does with other hypervisors. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Supporting hypervisor specific APIs in libvirt
On 03/22/2010 09:25 PM, Anthony Liguori wrote: Hi, I've mentioned this to a few folks already but I wanted to start a proper thread. We're struggling in qemu with usability and one area that concerns me is the disparity in features that are supported by qemu vs what's implemented in libvirt. This isn't necessarily libvirt's problem if it's mission is to provide a common hypervisor API that covers the most commonly used features. However, for qemu, we need an API that covers all of our features that people can develop against. The ultimate question we need to figure out is, should we encourage our users to always use libvirt or should we build our own API for people (and libvirt) to consume. I don't think it's necessarily a big technical challenge for libvirt to support qemu more completely. I think it amounts to introducing a series of virQemu APIs that implement qemu specific functions. Over time, qemu specific APIs can be deprecated in favour of more generic virDomain APIs. What's the feeling about this from the libvirt side of things? Is there interest in support hypervisor specific interfaces should we be looking to provide our own management interface for libvirt to consume? One option is to expose a qmp connection to the client. Of course that introduces a consistency problem (libvirt plugs in a card, user plugs it own, libvirt is confused). If the user promises to behave, it can work for stuff that's 100% orthogonal to libvirt. One problem is that this is libvirt version specific. For example, libvirt x doesn't support spice so we control that thorough qmp. But libvirt x+1 does support spice and now it gets confused about all the spice messages. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)
Anthony Liguori wrote: Hollis Blanchard wrote: On Wed, 2009-04-08 at 13:34 -0500, Anthony Liguori wrote: Right now only one monitor device can be enabled at a time. In order to support asynchronous notification of events, I would like to introduce a 'wait' command that waits for an event to occur. This implies that we need an additional monitor session to allow commands to still be executed while waiting for an asynchronous notification. Was there any consensus reached in this thread? I'm once again looking for ways to communicate qemu watchdog events to libvirt. We can do multiple monitors as a debugging tool, but to support events, a proper machine monitor mode is a prerequisite. The real requirement is that events are obtainable via a single communication channel instead of requiring two separate communication channels. Internal implementation will look at lot like these patches. The reasoning for requiring a single channel is that coordinating between the two channels is expected to be prohibitively difficult. To have a single channel, we need a machine mode. It cannot be done in a human readable fashion. I think this summarizes the consensus we reached. I don't agree fully with the above but I'm okay with it. If you don't agree with it, it isn't a consensus. Would you agree Avi? It represents my views fairly accurately. I'm not convinced that you can't to event notifications without machine mode, but on the other hand I do think introducing machine mode and layering notifications on top of that is the best way to proceed, so I can't complain. -- error compiling committee.c: too many arguments to function -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)
Jamie Lokier wrote: Avi Kivity wrote: Daniel P. Berrange wrote: Yes indeed its a little crazy :-) As anthony mentioned if libvirt were able to be notified of changes a user makes in the monitor, there's no reason we could not allow end users to access the monitor of a VM libvirt is managing. We just need to make sure libvirt doesn't miss changes like attaching or detaching block devices, etc, because that'll cause crash/data loss later when libvirt migrates or does save/restore, etc because it'll launch QEMU with wrong args You still have an inherent race here. user: plug in disk libvirt: start migration, still without disk qemu: libvirt, a disk has been plugged in. Then fix it. The race is not necessary. user: plug in a disk libvirt: lock VM against user changes incompatible with migration qemu: libvirt, lock granted libvirt: query for current disk state libvirt: start migration, knows about the disk The "libvirt, a disk has been plugged in" will be delivered but it's not important. libvirt queries the state of things after it acquires the lock and before it starts migration. Migration is supposed to be transparent. You're reducing quality of service if you're disabling features while migrating. That means that to debug a problem in the field you have to locate a guest's host, and follow it around as it migrates (or disable migration). That's right you do. Is there any way to debug a guest without disabling migration? I don't think there is at present, so of course you have to disable migration when you debug. Another reason for that "lock against migration" mentioned above. Nothing prevents you from debugging a guest during migration. You'll have to reconnect to the monitor, but that's it. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)
Daniel P. Berrange wrote: On Tue, Apr 14, 2009 at 12:15:23PM +0300, Avi Kivity wrote: Daniel P. Berrange wrote: Yes indeed its a little crazy :-) As anthony mentioned if libvirt were able to be notified of changes a user makes in the monitor, there's no reason we could not allow end users to access the monitor of a VM libvirt is managing. We just need to make sure libvirt doesn't miss changes like attaching or detaching block devices, etc, because that'll cause crash/data loss later when libvirt migrates or does save/restore, etc because it'll launch QEMU with wrong args You still have an inherent race here. user: plug in disk libvirt: start migration, still without disk qemu: libvirt, a disk has been plugged in. That is true, but we'd still be considering direct monitor access to be a 'expert' user mode of use. If they wish to shoot themselves in the foot by triggering a migration at same time they are hotplugging I'm fine if their whole leg gets blown away. What if the system triggers migration automatically (as you'd expect). And that's just one example. I'm sure there are more. libvirt issues commands expecting some state in qemu. It can't learn of that state from listening on another monitor, because there are delays between the state changing and the notification. If you want things to work reliably, you have to follow the chain of command. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)
Jan Kiszka wrote: That is true, but we'd still be considering direct monitor access to be a 'expert' user mode of use. If they wish to shoot themselves in the foot by triggering a migration at same time they are hotplugging I'm fine if their whole leg gets blown away. ...while there is also nothing that speaks against blocking any device hot-plugging while migration is ongoing. Independent of if there is some management app involved or the user himself plays with multiple monitors. If the management is doing the hotplugging, it should just to it on both sides. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)
Daniel P. Berrange wrote: Yes indeed its a little crazy :-) As anthony mentioned if libvirt were able to be notified of changes a user makes in the monitor, there's no reason we could not allow end users to access the monitor of a VM libvirt is managing. We just need to make sure libvirt doesn't miss changes like attaching or detaching block devices, etc, because that'll cause crash/data loss later when libvirt migrates or does save/restore, etc because it'll launch QEMU with wrong args You still have an inherent race here. user: plug in disk libvirt: start migration, still without disk qemu: libvirt, a disk has been plugged in. I don't see how adding those low-level monitory things to libvirt is an improvement - debugging and scripted keystrokes are not the sort of functionality libvirt is for - or is it? I think it could probably be argued that sending fake keystrokes could be within scope. Random ad-hoc debugging probably out of scope. That means that to debug a problem in the field you have to locate a guest's host, and follow it around as it migrates (or disable migration). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)
Anthony Liguori wrote: Avi Kivity wrote: Anthony Liguori wrote: IMHO, multiple monitors is a critical feature to support in the long term. Multiple monitors are nice to have (for developers), but I don't see them as critical. If you live in a world where there is a single management application that provides the only interface to interact with a QEMU instance, then yes, they aren't critical. I do (or at least I hope I do). Exposing the monitor to users is a layering violation. The problem with this is that most management applications are lossy by their nature. They expose only a subset of functionality supported by QEMU. What if they don't expose a feature because they don't want to make the feature available to the user? What happens when the user changes something that the management application thinks it controls? Do we add notifiers on everything? The qemu monitor is a different privilege level from being a virtual machine owner. Sure, we could theoritically plug all the holes with, for example the user filling up the disk with screendumps. But do we want to reduce security this way? You're taking away control from the management application, due to what are the management application's misfeatures. You should instead tell the vendor of your management application to add the missing feature. Oh, and don't expect users of a management application to connect to the qemu monitor to administer their virtual machines. They expect the management application to do that for them. The qemu monitor is an excellent way to control a single VM, but not for controlling many. Currently, the monitor is the "management interface" for QEMU. If we only every support one instance of that management interface, then it means if multiple management applications are to interact with a given QEMU instance, they must all use a single API to do that then allows for multiplexing. I see no reason that QEMU shouldn't do the multiplexing itself though. Again, I don't oppose multiplexing (though I do oppose the wait command which requires it, and I oppose this "management apps suck, let's telnet to qemu directly" use you propose. To put it another way, a user that uses libvirt today cannot see QEMU instances that are run manually. That is not true when a user uses libvirt with Xen today because Xend provides a management interface that is capable of supporting multiple clients. I think it's important to get the same level of functionality for QEMU. N.B. yes, Xend is a horrendous example especially when your argument has been simplicity vs. complexity. I'm sure libvirt really enjoys it when users use xm commands to change the VM state. What happens when you migrate it, for example? Or add a few dozen vcpus? At the end of the day, I want to be able to run a QEMU instance from the command line, and have virt-manager be able to see it remotely and connect to it. That means multiple monitors and it means that all commands that change VM state must generate some sort of notification such that libvirt can keep track of the changing state of a VM. I don't think most management application authors would expose the qemu monitor to users. It sounds like a huge risk, and for what benefit? If there's something interesting you can do with the monitor, add it to the management interface so people can actually use it. They don't buy this stuff so they can telnet into the monitor. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)
Anthony Liguori wrote: What's the established practice? Do you know of any protocol that is line based that does notifications like this? Actually there is one line oriented protocol that does asynchronous notifications. http://faqs.org/rfcs/rfc1459.html -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)
Anthony Liguori wrote: Avi Kivity wrote: (qemu) notify vnc on ... time passes, we want to allow members of group x to log in (qemu) vnc_set_acl group:x OK (qemu) notification: vnc connect aliguori (qemu) with a single monitor, we can be sure that the connect happened the vnc_set_acl. If the notification arrives on a different session, we have no way of knowing that. Only because there isn't a time stamp associated with the completion of the other monitor command. And you can globally replace timestamp with some sort of incrementing id that's associated with each notification and command completion. Sure, you can fix the problem, but why introduce it in the first place? I understand the urge for a simple command/response, but introducing multiple sessions breaks the "simple" and introduces new problems. You'll need this to support multiple monitors even with your model. Can you explain why? As far as I can tell, if you have async notifications, you can do everything from one monitor. IMHO, multiple monitors is a critical feature to support in the long term. Multiple monitors are nice to have (for developers), but I don't see them as critical. I expect that in the short term future, we'll have a non-human monitor mode that allows commands to be asynchronous. Then let's defer this until then? 'wait' is not useful for humans, they won't be retyping 'wait' every time something happens. But wait is useful for management apps today. A wait-forever, which is already in the next series, is also useful for humans. It may not be a perfect interface, but it's a step in the right direction. We have time before the next release and I expect that we'll have a non-human mode before then. I disagree, I think requiring multiple sessions for controlling a single application is clumsy. I can't think of one protocol which uses it. I don't think IMAP requires multiple sessions (and I don't think commands from one session can affect the other, except through the mail store). What's the established practice? Do you know of any protocol that is line based that does notifications like this? I guess most MUDs? I've never used a MUD before, I think that qualifies as before my time :-) Well I haven't either. Maybe time to start. IMAP IDLE is pretty close to "wait-forever". IMAP IDLE can be terminated by the client, and so does not require multiple sessions (though IMAP supports them). Most modern clients use multiple sessions. If you look at IMAP, it doesn't multiplex commands so multiple sessions are necessary to maintain usefulness while performing a long running task. But commands in one session don't affect others. Anyway, I think terminating a wait is a perfectly reasonable requirement. It breaks you command/response, though. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)
Anthony Liguori wrote: Avi Kivity wrote: Fine, let's say we did that, it's *still* racy because at time 3, the guest may hot remove cpu 2 on it's own since the guests VCPUs get to run in parallel to the monitor. A guest can't hotremove a vcpu. It may offline a vcpu, but that's not the same. Obviously, if both the guest and the management application can initiate the same action, then there will be races. But I don't think that's how things should be -- the guest should request a vcpu to be removed (or added), management thinks and files forms in triplicate, then hotadds or hotremoves the vcpu (most likely after it is no longer needed). With the proper beaurocracy, there is no race. You still have the same basic problem: time 0: (qemu) notify-enable vnc-events time 1: (qemu) foo time 4: time 5: notification: client connected time 0: vnc client connects time 2: vnc client disconnects At time 5, when the management app gets the notification, it cannot make any assumptions about the state of the system. You still need timestamps. You don't even need the foo to trigger this, qemu->user traffic can be arbitrarily delayed (I don't think we should hold notifications on partial input anyway). But there's no race here. The notification at time 5 means that the connect happened sometime before time 5, and that it may not be true now. The user cannot assume anything. A race can only happen against something the user initiated. Suppose we're implementing some kind of single sign on: (qemu) notify vnc on ... time passes, we want to allow members of group x to log in (qemu) vnc_set_acl group:x OK (qemu) notification: vnc connect aliguori (qemu) with a single monitor, we can be sure that the connect happened the vnc_set_acl. If the notification arrives on a different session, we have no way of knowing that. And even if you somehow eliminate the issue around masking notifications, you still have socket buffering that introduces the same problem. If you have one monitor, the problem is much simpler, since events travelling in the same direction (command acknowledge and a notification) cannot be reordered. With a command+wait, the problem is inherent. Command acknowledge is not an event. Events are out-of-band. Command completions are in-band. Right now, they are synchronous and That's all correct, but I don't see how that changes anything. I expect that in the short term future, we'll have a non-human monitor mode that allows commands to be asynchronous. Then let's defer this until then? 'wait' is not useful for humans, they won't be retyping 'wait' every time something happens. However, it's a mistake to muddle the distinction between an in-band completion and an out-of-band event. You cannot relate the out-of-band events commands. I can, if one happens before the other, and I have a single stream of command completions and event notifications. The best you can do is stick a time stamp on a notification and make sure the management tool understands that the notification is reflectively of the state when the event happened, not of the current state. Timestamps are really bad. They don't work at all if the management application is not on the same host. They work badly if it is on the same host, since commands and events will be timestamped at different processes. Timestamps are relative, not absolutely. They should not be used to associate anything with the outside world. In fact, I have no problem making the timestamps relative to QEMU startup just to ensure that noone tries to do something silly like associate notification timestamps with system time. Dunno, seems totally artificial to me to have to introduce timestamps to compensate for different delays in multiple sockets that we introduced five patches earlier. Please, let's keep this simple. FWIW, this problem is not at all unique to QEMU and is generally true of most protocols that support an out-of-band notification mechanism. command+wait makes it worse. Let's stick with established practice. What's the established practice? Do you know of any protocol that is line based that does notifications like this? I guess most MUDs? IMAP IDLE is pretty close to "wait-forever". IMAP IDLE can be terminated by the client, and so does not require multiple sessions (though IMAP supports them). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)
Anthony Liguori wrote: Avi Kivity wrote: Suppose you have a command which changes the meaning of a notification. If a notification arrives before the command completion, then it happened before the command was executed. If you want to make that reliable, you cannot have multiple monitors. Right. Since you can mask notifications, there can be an arbitrarily long time between notification and the event happening. Socket buffering presents the same problem. Image: Monitor 1: time 0: (qemu) hotadd_cpu 2 time 1: (qemu) hello world time 5: time 6: notification: cpu 2 added time 6: (qemu) Monitor 2: time 3: (qemu) hotremove_cpu 2 time 4: (qemu) time 5: notification: cpu 2 removed time 6: (qemu) So to eliminate this, you have to ban multiple monitors. Well, not ban multiple monitors, but require that for non-racy operation commands and notifications be on the same session. We can still debug on our dev-only monitor. Fine, let's say we did that, it's *still* racy because at time 3, the guest may hot remove cpu 2 on it's own since the guests VCPUs get to run in parallel to the monitor. A guest can't hotremove a vcpu. It may offline a vcpu, but that's not the same. Obviously, if both the guest and the management application can initiate the same action, then there will be races. But I don't think that's how things should be -- the guest should request a vcpu to be removed (or added), management thinks and files forms in triplicate, then hotadds or hotremoves the vcpu (most likely after it is no longer needed). With the proper beaurocracy, there is no race. And even if you somehow eliminate the issue around masking notifications, you still have socket buffering that introduces the same problem. If you have one monitor, the problem is much simpler, since events travelling in the same direction (command acknowledge and a notification) cannot be reordered. With a command+wait, the problem is inherent. The best you can do is stick a time stamp on a notification and make sure the management tool understands that the notification is reflectively of the state when the event happened, not of the current state. Timestamps are really bad. They don't work at all if the management application is not on the same host. They work badly if it is on the same host, since commands and events will be timestamped at different processes. FWIW, this problem is not at all unique to QEMU and is generally true of most protocols that support an out-of-band notification mechanism. command+wait makes it worse. Let's stick with established practice. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)
Jan Kiszka wrote: I'm not sure I understand your questions. Multiple monitor sessions are like multiple shell sessions. I don't think a control program should use more than one session, but it should allow a developer to connect to issue 'info registers' and 'x/20i' commands. Of course if a developer issues 'quit' or a hotunplug command, things will break. We agree if we want decoupled states of the monitor sessions (one session should definitely not be used to configure the output of another one). But I see no issues with collecting the events in one session that happen to be caused by activity in some other session. But maybe I'm missing your point. The management application will still think the device is plugged in, and things will break if it isn't. Of course if you asked for notification X on session Y, then event X should be delivered to session Y no matter how it originated (but not to session Z). Please no more async notifications to the monitors. They are just ugly to parse, at least for us humans. I don't want to see any notification in the middle of my half-typed command e.g. If we can identify an interactive session, we might redraw the partial command after the prompt. Uhh, please not this kind of terminal reprinting. The terminal user keep full control over when things can be printed. Very well. I guess a human user can open another session for notifications, if they are so inclined. btw, why would a human enable notifications? Note notifications enabled on the management session will only be displayed there. It's true that the common use case for events will be monitor applications. Still, enabling them for testing or simple scripting should not switch on ugly output mode or complicate the parsing. Fair enough. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)
Anthony Liguori wrote: Avi Kivity wrote: I'm sorry, I don't see why. It's just like a shell session. Compare with: Monitor 1: (qemu) notify enospace on (qemu) notify vnc-connect on (qemu) notify migration-completion on (qemu) migrate -d ... (qemu) migrate_cancel (qemu) migrate -d ... Monitor 2: (qemu) wait vnc connection ... (qemu) wait enospc on ide0-0 (qemu) wait migration cancelled (qemu) wait notification: migration completed There is no way to tell by looking what has happened (well, in this case you can, but in the general case you cannot). You have to look at two separate interactive sessions (ctrl-alt-2 ctrl-alt-3 ctrl-alt-3). You have to keep reissuing the wait command. Oh, and it's racy, so if you're interested in what really happens you have to issue info commands on session 1. How is it less racy? Suppose you have a command which changes the meaning of a notification. If a notification arrives before the command completion, then it happened before the command was executed. If it arrives after command completion, then it happened after the command was executed. Oh. If the command generates no output (like most), you can't tell when it completes. I suppose we could have qemu print OK after completing a command. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)
Jan Kiszka wrote: Avi Kivity wrote: Gerd Hoffmann wrote: On 04/09/09 16:03, Avi Kivity wrote: I don't want multiplexed monitor sessions, at all. I'm very happy to finally see them. Finally one can run vms with libvirt and *still* access the monitor for debugging and development purposes. Right, I like them for that purpose as well. But not for ordinary control. How do you want to differentiate? What further complications would this bring us? I'm not sure I understand your questions. Multiple monitor sessions are like multiple shell sessions. I don't think a control program should use more than one session, but it should allow a developer to connect to issue 'info registers' and 'x/20i' commands. Of course if a developer issues 'quit' or a hotunplug command, things will break. Please no more async notifications to the monitors. They are just ugly to parse, at least for us humans. I don't want to see any notification in the middle of my half-typed command e.g. If we can identify an interactive session, we might redraw the partial command after the prompt. btw, why would a human enable notifications? Note notifications enabled on the management session will only be displayed there. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)
Anthony Liguori wrote: Avi Kivity wrote: I'd make everything line-oriented. Anything from the user up to \n is buffered and ignored until the \n arrives. Once the \n arrives, the command is acted upon atomically, either completing fully or launching an async notification. So the rules are: whenever the monitor is idle, a notification can be printed out. So by idle, you mean, the end of the output buffer ends in either '\n' or '\n(qemu) '. The input buffer must also be empty. You don't have to look any buffers. If the monitor is processing a command, it is busy. An asynchronous command ('migrate -d') is not processed in the monitor after it is launched, so it doesn't keep the monitor busy. A monitor enters idle after printing the prompt, and leaves idle when it starts processing a command. If you meant from the user side, a notification always follows the prompt. (qemu) notify enospace on (qemu) notify vnc-connect on (qemu) notification: vnc connection ... (qemu) notify migration-completion on (qemu) migrate -d ... notification: enospc on ide0-0 (qemu) migrate_cancel notification: migration cancelled (qemu) migrate -d ... (qemu) notification: migration completed This hurts my eyes. It's not human readable. I'm sorry, I don't see why. It's just like a shell session. Compare with: Monitor 1: (qemu) notify enospace on (qemu) notify vnc-connect on (qemu) notify migration-completion on (qemu) migrate -d ... (qemu) migrate_cancel (qemu) migrate -d ... Monitor 2: (qemu) wait vnc connection ... (qemu) wait enospc on ide0-0 (qemu) wait migration cancelled (qemu) wait notification: migration completed There is no way to tell by looking what has happened (well, in this case you can, but in the general case you cannot). You have to look at two separate interactive sessions (ctrl-alt-2 ctrl-alt-3 ctrl-alt-3). You have to keep reissuing the wait command. Oh, and it's racy, so if you're interested in what really happens you have to issue info commands on session 1. That's unusable. If we're going to do this, we might as well have a non-human mode which would oddly enough be more human readable. If you do this, then your session looks an awful lot like my session from a previous note. I think we should. I think the thing that is missing is that the 'wait' command does not have to be part of the non-human mode. In non-human mode, you are always doing an implicit wait. I think 'wait' is unusable for humans. If I want qemu to tell me something happened, it's enough to enable notifications. There's no need to tell it to wait every time something happens. That's poll(2), there's no poll(1). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)
Anthony Liguori wrote: Avi Kivity wrote: (qemu) notify enospace on (qemu) notify vnc-connect on (qemu) notification: vnc connection ... (qemu) notify migration-completion on (qemu) migrate -d ... notification: enospc on ide0-0 (qemu) migrate_cancel notification: migration cancelled (qemu) migrate -d ... (qemu) notification: migration completed What are the rules for printing out 'notification'? Do you want for the end of the buffer to be '\n' or '\n(qemu )'? If so, if I type: (qemu) f But don't hit enter, would that suppress notification? I'd make everything line-oriented. Anything from the user up to \n is buffered and ignored until the \n arrives. Once the \n arrives, the command is acted upon atomically, either completing fully or launching an async notification. So the rules are: whenever the monitor is idle, a notification can be printed out. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list