from:"Avi Kivity"

Re: [libvirt] NFS over RDMA small block DIRECT_IO bug

2012-09-05 Thread Avi Kivity

On 09/04/2012 03:04 PM, Myklebust, Trond wrote:
> On Tue, 2012-09-04 at 11:31 +0200, Andrew Holway wrote:
>> Hello.
>> 
>> # Avi Kivity avi(a)redhat recommended I copy kvm in on this. It would also 
>> seem relevent to libvirt. #
>> 
>> I have a Centos 6.2 server and Centos 6.2 client.
>> 
>> [root@store ~]# cat /etc/exports 
>> /dev/shm 
>> 10.149.0.0/16(rw,fsid=1,no_root_squash,insecure)(I have tried with non 
>> tempfs targets also)
>> 
>> 
>> [root@node001 ~]# cat /etc/fstab 
>> store.ibnet:/dev/shm /mnt nfs  
>> rdma,port=2050,defaults 0 0
>> 
>> 
>> I wrote a little for loop one liner that dd'd the centos net install image 
>> to a file called 'hello' then checksummed that file. Each iteration uses a 
>> different block size.
>> 
>> Non DIRECT_IO seems to work fine. DIRECT_IO with 512byte, 1K and 2K block 
>> sizes get corrupted.
> 
> 
> That is expected behaviour. DIRECT_IO over RDMA needs to be page aligned
> so that it can use the more efficient RDMA READ and RDMA WRITE memory
> semantics (instead of the SEND/RECEIVE channel semantics).

Shouldn't subpage requests fail then?  O_DIRECT block requests fail for
subsector writes, instead of corrupting your data.

Hopefully this is documented somewhere.

-- 
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [User question] Huge buffer size on KVM host

2012-08-16 Thread Avi Kivity

On 08/16/2012 05:54 PM, Martin Wawro wrote:
> 
> On Aug 15, 2012, at 2:57 PM, Avi Kivity wrote:
> 
>>> 
>>> We are using logical volumes and the cache is set to 'none'.
>> 
>> Strange, that should work without any buffering.
>> 
>> What the contents of
>> 
>>  /sys/block/sda/queue/hw_sector_size
>> 
>> and
>> 
>>  /sys/block/sda/queue/logical_block_size
>> 
>> ?
>> 
> 
> Hi Avi,
> 
> It seems that the kernel on that particular machine is too old, those entries 
> are
> not featured. We checked on a comparable setup with a newer kernel and both 
> entries
> were set to 512.
> 
> We also did have a third more thorough look on the caching. It turns out that 
> the
> virt-manager does not seem to honor the caching adjusted in the GUI correctly.
> We disabled caching on all virtual devices for this particular VM and checking
> with "ps -fxal" revealed, that only one of those devices (and a rather small 
> one too)
> had this set. We corrected this in the XML file directly and the buffer size
> currently resides at around 1.8 GB after rebooting the VM (the only virtio 
> device
> not having the cache=none option set is now the (non-mounted) cdrom).
> 

cc += libvirt-list

Is there a reason that cdroms don't get cache=none?


-- 
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] QEMU CPU model versioning/compatibility (was Re: KVM call minutes July 31th)

2012-08-01 Thread Avi Kivity

On 07/31/2012 06:14 PM, Eduardo Habkost wrote:
> On Tue, Jul 31, 2012 at 04:32:05PM +0200, Juan Quintela wrote:
>> - 1.2 plans for CPU model versioning/compatibility (eduardo)
>>   (global properties vs QOM vs qdev)
>>   how to do it ?  configuration file?  moving back to the code?
>>   different external interface from internal one
> 
> (CCing libvir-list)
> 
> So, the problem is (please correct me if I am wrong):
> 
> - libvirt expects the CPU models from the current config file to be
>   provided by QEMU. libvirt won't write a cpudef config file.
> - Fixing compatibility problems on the CPU models (even the ones that
>   are in config files) are expected to be QEMU's responsibility.
> 
> Moving the CPU models inside the code is a step backwards, IMO. I don't
> think loading some kind of data from an external file (provided and
> maintained by QEMU itself) should be considered something special and
> magic, and make the data there a second-class citizen (that QEMU refuses
> to give proper support and keep compatibility).

I agree.

> But if it will make us stop running in circles every time something
> related to those files needs to be changed (remember the -no-user-config
> discussion?), I think I will accept that.

The issue is that we have a lot of machinery for backwards compatibility
in the code, and none in cpu definitions parser.  Yes we could mark up
the cpu definitions so it could be text driven, but that's effort that
could be spent elsewhere.  Moving it to a data-driven C implementation
that can rely on the existing backwards compat code is a good tradeoff IMO.

-- 
error compiling committee.c: too many arguments to function


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Constantly changing USB product ID

2012-03-28 Thread Avi Kivity

On 03/28/2012 02:41 PM, Avi Kivity wrote:
> On 03/27/2012 05:48 PM, Jaap Winius wrote:
> > Hi folks,
> >
> > Recently I learned how to configure KVM with USB pass-though
> > functionality. In my case I configured my guest domain with this block
> > of code:
> >
> > 
> >   
> > 
> > 
> > 
> >   
> > 
> >
> > At first this worked fine, but then later the guest domain refused to
> > start because the USB device was absent. When I checked, I found that
> > its product ID had mysteriously changed to 1771. Later it was back at
> > 1772. Now it appears that the USB device I am dealing with has a
> > product ID that changes back and forth between 1771 and 1772 at random.
> >
> > Apparently, the Windows program running on the guest domain is
> > designed to deal with this nonsense, but the question is, Can KVM be
> > configured to deal with it? Something like 
> > would be useful, but that doesn't work.
> >
> > Any ideas would be much appreciated.
> >
>
> This is really strange.  What kind of device is this?
>
> I've filed an RFE [1] for virt-manager for assigning USB host devices
> opportunistically, that is if they're plugged they're assigned, and if
> not the guest starts without them.  If it were implemented, you could
> assign both 0x1771 and 0x1772 and whichever ID the device is today would
> get assigned.
>
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=804432
>

btw, the correct place for this discussion is likely the libvirt mailing
list, or maybe the virt-manager list if it exists.

-- 
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt

2012-03-28 Thread Avi Kivity

On 03/26/2012 09:00 PM, Anthony Liguori wrote:
>>> Yes, that's one reason.  But maybe a user wants to have a whole
>>> different set of machine types and doesn't care to have the ones we
>>> provide.  Why prevent a user from doing this?
>>
>> How are we preventing a user from doing it?  In what way is -nodefconfig
>> helping it?
>
>
> Let me explain it in a different way, perhaps.
>
> We launch smbd in QEMU in order to do file sharing over slirp.  One of
> the historic problems we've had is that we don't assume root
> privileges, yet want to be able to run smbd without using any of the
> system configuration files.
>
> You can do this by specify -s with the config file, and then in the
> config file you can overload the various default paths (like private
> dir, lock dir, etc.). In some cases, earlier versions of smbd didn't
> allow you to change private dir.
>
> You should be able to tell a well behaved tool not to read any
> configuration/data files and explicitly tell it where/how to read
> them.  We cannot exhaustively anticipate every future use case of QEMU.

100% agree.  But that says nothing about a text file that defines
"westmere" as a set of cpu flags, as long as we allow the user to define
"mywestmere" as a different set.  That is because target-x86_64.cfg does
not configure anything, it just defines a macro, which qemu doesn't
force you to use.

>
> But beyond the justification for -nodefconfig, the fact is that it
> exists today, and has a specific semantic.  If we want to have a
> different semantic, we should introduce a new option (-no-user-config).

Sure.

-- 
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt

2012-03-28 Thread Avi Kivity

On 03/26/2012 09:03 PM, Anthony Liguori wrote:
>
> I think what we want to move toward is a -no-machine option which
> allows a user to explicitly build a machine from scratch.  That is:
>
> qemu -no-machine -device i440fx,id=host -device isa-serial,chr=chr0 ...
>

I'd call it -M bare-1.1, so that it can be used to override driver
properties in 1.2+.

So we'd have

  # default machine for this version
  qemu / qemu -M pc

  # an older version's pc
  qemu -M pc-1.1

  # just a chassis, bring your own screwdriver
  qemu -M bare

  # previous generation chassis, beige
  qemu -M bare-1.1

That is because -M not only specifies the components that go into the
machine, it also alters other devices you add to it.

This also helps preserve the planet's dwindling supply of command line
options.

-- 
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt

2012-03-26 Thread Avi Kivity

On 03/26/2012 01:24 PM, Jiri Denemark wrote:
> ...
> > > The command line becomes unstable if you use -nodefconfig.
> > 
> > -no-user-config solves this but I fully expect libvirt would continue to 
> > use 
> > -nodefconfig.
>
> Libvirt uses -nodefaults -nodefconfig because it wants to fully control how
> the virtual machine will look like (mainly in terms of devices). In other
> words, we don't want any devices to just magically appear without libvirt
> knowing about them. -nodefaults gets rid of default devices that are built
> directly in qemu. Since users can set any devices or command line options
> (such as enable-kvm) into qemu configuration files in @SYSCONFDIR@, we need to
> avoid reading those files as well. Hence we use -nodefconfig. However, we
> would still like qemu to read CPU definitions, machine types, etc. once they
> become externally loaded configuration (or however we decide to call it). That
> said, when CPU definitions are moved into @DATADIR@, and -no-user-config is
> introduced, I don't see any reason for libvirt to keep using -nodefconfig.
>
> I actually like
> -no-user-config
> more than
> -nodefconfig -readconfig @DATADIR@/...
> since it would avoid additional magic to detect what files libvirt should
> explicitly pass to -readconfig but basically any approach that would allow us
> to do read files only from @DATADIR@ is much better than what we have with
> -nodefconfig now.

That's how I see it as well.

-- 
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt

2012-03-26 Thread Avi Kivity

On 03/25/2012 08:11 PM, Anthony Liguori wrote:
>
>> I don't think -nodefconfig (as defined) is usable, since there is no way
>> for the user to tell what it means short of reading those files.
>
> *if the user doesn't know specifics about this QEMU version.
>
> You make the assumption that all users are going to throw arbitrary
> options at arbitrary QEMU versions.  That's certainly an important
> use-case but it's not the only one.

If a Fedora user is using qemu, then their qemu version will change
every six months.  Their options are to update their scripts/management
tool in step, or not have their management tool use -nodefconfig.

The same holds for anyone using qemu from upstream, since that's
approximately the qemu release cycle.

>
>> -no-user-config is usable, I think it needs also to mean that qemu
>> without -M/-cpu/-m options will error out?
>
> You're confusing -nodefaults (or something stronger than -nodefaults)
> with -no-user-config.
>

Right.

> Yes, the distinctions are confusing.  It's not all fixable tomorrow. 
> If we take my config refactoring series, we can get 90% of the way
> there soon but Paolo has a more thorough refactoring..
>
 "#define westmere blah" is not configuration, otherwise the meaning of
 configuration will drift over time.

 -cpu blah is, of course.
>>>
>>> It's the same mechanism, but the above would create two classes of
>>> default configuration files and then it becomes a question of how
>>> they're used.
>>
>> Confused.
>
> We don't have a formal concept of -read-definition-config and
> -read-configuration-config
>
> There's no easy or obvious way to create such a concept either nor do
> I think the distinction is meaningful to users.

Definition files should be invisible to users.  They're part of the
implementation.  If we have a file that says

  pc-1.1 = piix + cirrus + memory(128) + ...

then it's nobody's business if it's in a text file or a .c file.

Of course it's  nice to allow users to load their own definition files,
but that's strictly a convenience.

>> Exactly.  The types are no different, so there's no reason to
>> discriminate against types that happen to live in qemu-provided data
>> files vs. qemu code.  They aren't instantiated, so we lose nothing by
>> creating the factories (just so long as the factories aren't
>> mass-producing objects).
>
>
> At some point, I'd like to have type modules that are shared objects. 
> I'd like QEMU to start with almost no builtin types and allow the user
> to configure which modules get loaded.
>
> In the long term, I'd like QEMU to be a small, robust core with the
> vast majority of code relegated to modules with the user ultimately in
> control of module loading.
>
> Yes, I'd want some module autoloading system but there should always
> be a way to launch QEMU without loading any modules and then load a
> very specific set of modules (as defined by the user).
>
> You can imagine this being useful for something like Common Criteria
> certifications.

Okay.

> It's obviously defined for a given release, just not defined long term.
>
>> If I see something like -nodefconfig, I assume it will create a bare
>> bones guest that will not depend on any qemu defaults and will be stable
>> across releases.
>
> That's not even close to what -nodefconfig is.  That's pretty much
> what -nodefaults is but -nodefaults has also had a fluid definition
> historically.

Okay.  Let's just make sure to document -nodefconfig as version specific
and -nodefaults as the stable way to create a bare bones guest (and
define exactly what that means).

-- 
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt

2012-03-25 Thread Avi Kivity

On 03/25/2012 08:01 PM, Anthony Liguori wrote:
>> I don't think this came out of happiness, but despair.  Seriously,
>> keeping compatibility is one of the things we work hardest to achieve,
>> and we can't manage it for our command line?
>
>
> I hate to burst your bubble, but we struggle and rarely maintain the
> level of compatibility you're seeking to have.
>
> I agree with you that we need to do a better job maintaining
> compatibility which is why I'm trying to clearly separate the things
> that we will never break from the things that will change over time.
>
> -nodefconfig is a moving target.  If you want stability, don't use
> it.  If you just want to prevent the user's /etc/qemu stuff from being
> loaded, use -no-user-config.

Fine, but let's clearly document it as such.

Note just saying it doesn't load any configuration files isn't
sufficient.  We have to say that it kills Westmere and some of its
friends, but preserves others like qemu64.  Otherwise it's impossible to
use it except by trial and error.

>
>>>
>>> I'm not saying that backwards compat isn't important--it is.  But
>>> there are users who are happy to live on the bleeding edge.
>>
>> That's fine, but I don't see how -nodefconfig helps them.  All it does
>> is take away the building blocks (definitions) that they can use when
>> setting up their configuration.
>
> Yes, this is a feature.

I don't see how, but okay.

>
 Suppose we define the southbridge via a configuration file.  Does that
 mean we don't load it any more?
>>>
>>> Yes.  If I want the leanest and meanest version of QEMU that will
>>> start in the smallest number of milliseconds, then being able to tell
>>> QEMU not to load configuration files and create a very specific
>>> machine is a Good Thing.  Why exclude users from being able to do this?
>>
>> So is this the point?  Reducing startup time?
>
> Yes, that's one reason.  But maybe a user wants to have a whole
> different set of machine types and doesn't care to have the ones we
> provide.  Why prevent a user from doing this?

How are we preventing a user from doing it?  In what way is -nodefconfig
helping it?

> Maybe they have a management tool that attempts to totally hide QEMU
> from the end user and exposes a different set of machine types.  It's
> certainly more convenient for something like the Android emulator to
> only have to deal with QEMU knowing about the 4 types of machines that
> it specifically supports.

If it supports four types, it should always pass one of them to qemu. 
The only thing -nodefconfig adds is breakage when qemu moves something
that one of those four machines relies on to a config file.

-- 
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt

2012-03-25 Thread Avi Kivity

On 03/25/2012 03:26 PM, Anthony Liguori wrote:
>>> We would continue to have Westmere/etc in QEMU exposed as part of the
>>> user configuration.  But I don't think it makes a lot of sense to have
>>> to modify QEMU any time a new CPU comes out.
>>
>> We have to.  New features often come with new MSRs which need to be live
>> migrated, and of course the cpu flags as well.  We may push all these to
>> qemu data files, but this is still qemu.  We can't let a management tool
>> decide that cpu feature X is safe to use on qemu version Y.
>
>
> I think QEMU should own CPU definitions.  

Agree.

> I think a management tool should have the choice of whether they are
> used though because they are a policy IMHO.
>
> It's okay for QEMU to implement some degree of policy as long as a
> management tool can override it with a different policy.

Sure.

We can have something like

  # default machine's westmere
  qemu -cpu westmere

  # pc-1.0's westmere
  qemu -M pc-1.0 -cpu westmere

  # pc-1.0's westmere, without nx-less
  qemu -M pc-1.0 -cpu westmere,-nx

  # specify everything in painful detail
  qemu -cpu
vendor=Foo,family=17,model=19,stepping=3,maxleaf=12,+fpu,+vme,leaf10eax=0x1234567,+etc


-- 
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt

2012-03-25 Thread Avi Kivity

On 03/25/2012 05:30 PM, Anthony Liguori wrote:
> On 03/25/2012 10:18 AM, Avi Kivity wrote:
>> On 03/25/2012 05:07 PM, Anthony Liguori wrote:
>>>> As log as qemu -nodefconfig -cpu westmere -M pc1.1
>>>
>>>
>>> -nodefconfig is going to eventually mean that -cpu westmere and -M
>>> pc-1.1 will not work.
>>>
>>> This is where QEMU is going.  There is no reason that a normal user
>>> should ever use -nodefconfig.
>>
>> I don't think anyone or anything can use it, since it's meaning is not
>> well defined.  "not read any configuration files" where parts of qemu
>> are continually moved out to configuration files means its a moving
>> target.
>
> I think you assume that all QEMU users care about forward and
> backwards compatibility on the command line about all else.
>
> That's really not true.  The libvirt folks have stated repeatedly that
> command line backwards compatibility is not critical to them.  They
> are happy to require that a new version of QEMU requires a new version
> of libvirt.

I don't think this came out of happiness, but despair.  Seriously,
keeping compatibility is one of the things we work hardest to achieve,
and we can't manage it for our command line?

>
> I'm not saying that backwards compat isn't important--it is.  But
> there are users who are happy to live on the bleeding edge.

That's fine, but I don't see how -nodefconfig helps them.  All it does
is take away the building blocks (definitions) that they can use when
setting up their configuration.

>
>> Suppose we define the southbridge via a configuration file.  Does that
>> mean we don't load it any more?
>
> Yes.  If I want the leanest and meanest version of QEMU that will
> start in the smallest number of milliseconds, then being able to tell
> QEMU not to load configuration files and create a very specific
> machine is a Good Thing.  Why exclude users from being able to do this?

So is this the point?  Reducing startup time?

I can't say I see the reason to invest so much effort in shaving a
millisecond or less from this, but if we did want to, the way would be
lazy loading of the configuration where items are parsed as they are
referenced.

-- 
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt

2012-03-25 Thread Avi Kivity

On 03/25/2012 05:26 PM, Anthony Liguori wrote:
>> Put the emphasis around *configuration*.
>
>
> So how about:
>
> 1) Load ['@SYSCONFDIR@/qemu/qemu.cfg',
> '@SYSCONFDIR@/qemu/target-@ARCH@.cfg',
>  '@DATADIR@/system.cfg', '@DATADIR@/system-@ARCH@.cfg']
>
> 2) system-@ARCH@.cfg will contain:
>
> [system]
> readconfig=@DATADIR@/target-@a...@-cpus.cfg
> readconfig=@DATADIR@/target-@a...@-machine.cfg
>
> 3) -nodefconfig will not load any configuration files from DATADIR or
> SYSCONFDIR.  -no-user-config will not load any configuration files
> from SYSCONFDIR.

What, more options?

I don't think -nodefconfig (as defined) is usable, since there is no way
for the user to tell what it means short of reading those files.

-no-user-config is usable, I think it needs also to mean that qemu
without -M/-cpu/-m options will error out? since the default machine/cpu
types are default configuration.

>
>> "#define westmere blah" is not configuration, otherwise the meaning of
>> configuration will drift over time.
>>
>> -cpu blah is, of course.
>
> It's the same mechanism, but the above would create two classes of
> default configuration files and then it becomes a question of how
> they're used.

Confused.

>
 The file defines westmere as an alias for a grab bag of options.
 Whether it's loaded or not is immaterial, unless someone uses one
 of the
 names within.
>>>
>>> But you would agree, a management tool should be able to control
>>> whether class factories get loaded, right?
>>
>> No, why?  But perhaps I don't entirely get what you mean by "class
>> factories".
>>
>> Aren't they just implementations of
>>
>> virtual Device *new_instance(...) = 0?
>>
>> if so, why not load them?
>
> No, a class factory creates a new type of class.  -cpudef will
> ultimately call type_register() to create a new QOM visible type. 
> From a management tools perspective, the type is no different than a
> built-in type.

Exactly.  The types are no different, so there's no reason to
discriminate against types that happen to live in qemu-provided data
files vs. qemu code.  They aren't instantiated, so we lose nothing by
creating the factories (just so long as the factories aren't
mass-producing objects).

>
 Otherwise, the meaning of -nodefconfig changes as more stuff is moved
 out of .c and into .cfg.
>>>
>>> What's the problem with this?
>>
>> The command line becomes unstable if you use -nodefconfig.
>
> -no-user-config solves this but I fully expect libvirt would continue
> to use -nodefconfig.


I don't see how libvirt can use -nodefconfig with the fluid meaning you
attach to it, or what it gains from it.

>>
>> -nodefconfig = create an empty machine, don't assume anything (=don't
>> read qemu.cfg) let me build it out of all those lego bricks.  Those can
>> be defined in code or in definition files in /usr/share, I don't care.
>>
>> Maybe that's -nodevices -vga none.  But in this case I don't see the
>> point in -nodefconfig.  Not loading target_x86-64.cfg doesn't buy the
>> user anything, since it wouldn't affect the guest in any way.
>
>
> -nodefconfig doesn't mean what you think it means.  -nodefconfig
> doesn't say anything about the user visible machine.
>
> -nodefconfig tells QEMU not to read any configuration files at start
> up.  This has an undefined affect on the user visible machine that
> depends on the specific version of QEMU.

Then it's broken.  How can anyone use something that has an undefined
effect?

If I see something like -nodefconfig, I assume it will create a bare
bones guest that will not depend on any qemu defaults and will be stable
across releases.  I don't think anyone will understand -nodefconfig to
be something version dependent without reading the qemu management tool
author's guide.

-- 
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt

2012-03-25 Thread Avi Kivity

On 03/25/2012 05:07 PM, Anthony Liguori wrote:
>> As log as qemu -nodefconfig -cpu westmere -M pc1.1
>
>
> -nodefconfig is going to eventually mean that -cpu westmere and -M
> pc-1.1 will not work.
>
> This is where QEMU is going.  There is no reason that a normal user
> should ever use -nodefconfig.

I don't think anyone or anything can use it, since it's meaning is not
well defined.  "not read any configuration files" where parts of qemu
are continually moved out to configuration files means its a moving target.

Suppose we define the southbridge via a configuration file.  Does that
mean we don't load it any more?

-- 
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt

2012-03-25 Thread Avi Kivity

On 03/25/2012 04:59 PM, Anthony Liguori wrote:
> On 03/25/2012 09:46 AM, Avi Kivity wrote:
>> On 03/25/2012 04:36 PM, Anthony Liguori wrote:
>>>> Apart from the command line length, it confuses configuration with
>>>> definition.
>>>
>>>
>>> There is no distinction with what we have today.  Our configuration
>>> file basically corresponds to command line options and as there is no
>>> distinction in command line options, there's no distinction in the
>>> configuration format.
>>
>> We don't have command line options for defining, only configuring.
>
> That's an oversight.  There should be a -cpudef option.  It's a
> QemuOptsList.
>
>> Again, defining = #define
>
> I think -global fits your definition of #define...

Yes (apart from the corner case of modifying a default-instantiated device).

>>> B) A management tool has complete control over cpu definitions without
>>> modifying the underlying filesystem.  -nodefconfig will prevent it
>>> from loading and the management tool can explicitly load the QEMU
>>> definition (via -readconfig, potentially using a /dev/fd/N path) or it
>>> can define it's own cpu definitions.
>>
>> Why does -nodefconfig affect anything?
>
>
> Because -nodefconfig means "don't load *any* default configuration
> files".

Put the emphasis around *configuration*.

"#define westmere blah" is not configuration, otherwise the meaning of
configuration will drift over time.

-cpu blah is, of course.

>
>> The file defines westmere as an alias for a grab bag of options.
>> Whether it's loaded or not is immaterial, unless someone uses one of the
>> names within.
>
> But you would agree, a management tool should be able to control
> whether class factories get loaded, right?  

No, why?  But perhaps I don't entirely get what you mean by "class
factories".

Aren't they just implementations of

   virtual Device *new_instance(...) = 0?
  
if so, why not load them?

> So what's the mechanism to do this?
>
>>> C) This model maps to any other type of class factory.  Machines will
>>> eventually be expressed as a class factory.  When we implement this,
>>> we would change the default target-x86_64-cpu.cfg to:
>>>
>>> [system]
>>> # Load default CPU definitions
>>> readconfig = @DATADIR@/target-x86_64-cpus.cfg
>>> # Load default machines
>>> readconfig = @DATADIR@/target-x86_64-machines.cfg
>>>
>>> A machine definition would look like:
>>>
>>> [machinedef]
>>>   name = pc-0.15
>>>   virtio-blk.class_code = 32
>>>   ...
>>>
>>> Loading a file based on -cpu doesn't generalize well unless we try to
>>> load a definition for any possible QOM type to find the class factory
>>> for it.  I don't think this is a good idea.
>>
>> Why not load all class factories?  Just don't instantiate any objects.
>
> Unless we have two different config syntaxes, I think it will lead to
> a lot of confusion.  Having some parts of a config file be parsed and
> others not is fairly strange.

Parse all of them (and make sure all are class factories).

The only real configuration item is that without -nodefconfig, we create
a -M pc-1.1 system.  Everything else derives from that.

>
>> Otherwise, the meaning of -nodefconfig changes as more stuff is moved
>> out of .c and into .cfg.
>
> What's the problem with this?

The command line becomes unstable if you use -nodefconfig.

>>>
>>> In my target-$(ARCH).cfg, I have:
>>>
>>> [machine]
>>> enable-kvm = "on"
>>>
>>> Which means I don't have to use -enable-kvm anymore.  But if you look
>>> at a tool like libguestfs, start up time is the most important thing
>>> so avoiding unnecessary I/O and processing is critical.
>>
>> So this is definitely configuration (applies to the current instance) as
>> opposed to target-x86_64.cfg, which doesn't.
>  
>
> I'm not sure which part you're responding to..

I was saying that target-x86_64.cfg appears to be definitions, not
configuration, and was asking about qemu.cfg (which is configuration).

>> As far as I can tell, the only difference is that -nodefconfig -cpu
>> westmere will error out instead of working.  But if you don't supply
>> -cpu westmere, the configuration is identical.
>
> What configuration?
>
> Let me ask, what do you think the semantics of -nodefconfig should
> be?  I'm not sure I understand what you're advocating for.
>

-nodefconfig = create an empty machine, don't assume anything (=don't
read qemu.cfg) let me build it out of all those lego bricks.  Those can
be defined in code or in definition files in /usr/share, I don't care.

Maybe that's -nodevices -vga none.  But in this case I don't see the
point in -nodefconfig.  Not loading target_x86-64.cfg doesn't buy the
user anything, since it wouldn't affect the guest in any way.

-- 
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt

2012-03-25 Thread Avi Kivity

On 03/25/2012 04:36 PM, Anthony Liguori wrote:
>> Apart from the command line length, it confuses configuration with
>> definition.
>
>
> There is no distinction with what we have today.  Our configuration
> file basically corresponds to command line options and as there is no
> distinction in command line options, there's no distinction in the
> configuration format.

We don't have command line options for defining, only configuring.

Again, defining = #define
Configuring = modifying current instance

>
>> target-x86_64-cpus.cfg does not configure qemu for anything, it's merely
>> the equivalent of
>>
>>#define westmere (x86_def_t) { ... }
>>#define nehalem (x86_def_t) { ... }
>>#define bulldozer (x86_def_t) { ... } // for PC
>>
>> so it should be read at each invocation.  On the other hand, pc.cfg and
>> westmere.cfg (as used previously) are shorthand for
>>
>> machine = (QEMUMachine) { ... };
>> cpu = (x86_def_t) { ... };
>>
>> so they should only be read if requested explicitly (or indirectly).
>
> This doesn't make a lot of sense to me.  Here's what I'm proposing:
>
> 1) QEMU would have a target-x86_64-cpu.cfg.in that is installed by
> default in /etc/qemu.  It would contain:
>
> [system]
> # Load default CPU definitions
> readconfig = @DATADIR@/target-x86_64-cpus.cfg
>
> 2) target-x86_64-cpus.cfg would be installed to @DATADIR@ and would
> contain:
>
> [cpudef]
>   name = "Westmere"
>   ...
>
> This has the following properties:
>
> A) QEMU has no builtin notion of CPU definitions.  It just has a "cpu
> factory".  -cpudef will create a new class called Westmere that can
> then be enumerated through qom-type-list and created via qom-create.
>
> B) A management tool has complete control over cpu definitions without
> modifying the underlying filesystem.  -nodefconfig will prevent it
> from loading and the management tool can explicitly load the QEMU
> definition (via -readconfig, potentially using a /dev/fd/N path) or it
> can define it's own cpu definitions.

Why does -nodefconfig affect anything?

The file defines westmere as an alias for a grab bag of options. 
Whether it's loaded or not is immaterial, unless someone uses one of the
names within.

>
> C) This model maps to any other type of class factory.  Machines will
> eventually be expressed as a class factory.  When we implement this,
> we would change the default target-x86_64-cpu.cfg to:
>
> [system]
> # Load default CPU definitions
> readconfig = @DATADIR@/target-x86_64-cpus.cfg
> # Load default machines
> readconfig = @DATADIR@/target-x86_64-machines.cfg
>
> A machine definition would look like:
>
> [machinedef]
>  name = pc-0.15
>  virtio-blk.class_code = 32
>  ...
>
> Loading a file based on -cpu doesn't generalize well unless we try to
> load a definition for any possible QOM type to find the class factory
> for it.  I don't think this is a good idea.

Why not load all class factories?  Just don't instantiate any objects.

Otherwise, the meaning of -nodefconfig changes as more stuff is moved
out of .c and into .cfg.

>
 The reasoning is, loading target-x86_64-cpus.cfg does not alter the
 current instance's configuration, so reading it doesn't violate
 -nodefconfig.
>>>
>>> I think we have a different view of what -nodefconfig does.
>>>
>>> We have a couple options today:
>>>
>>> -nodefconfig
>>>
>>> Don't read the default configuration files.  By default, we read
>>> /etc/qemu/qemu.cfg and /etc/qemu/target-$(ARCH).cfg
>>>
>>
>> The latter seems meaningless to avoid reading.  It's just a set of
>> #defines, what do you get by not reading it?
>
> In my target-$(ARCH).cfg, I have:
>
> [machine]
> enable-kvm = "on"
>
> Which means I don't have to use -enable-kvm anymore.  But if you look
> at a tool like libguestfs, start up time is the most important thing
> so avoiding unnecessary I/O and processing is critical.

So this is definitely configuration (applies to the current instance) as
opposed to target-x86_64.cfg, which doesn't.

>
>>> -nodefaults
>>>
>>> Don't create default devices.
>>>
>>> -vga none
>>>
>>> Don't create the default VGA device (not covered by -nodefaults).
>>>
>>> With these two options, the semantics you get an absolutely
>>> minimalistic instance of QEMU.  Tools like libguestfs really want to
>>> create the simplest guest and do the least amount of processing so the
>>> guest runs as fast as possible.
>>>
>>> It does suck a lot that this isn't a single option.  I would much
>>> prefer -nodefaults to be implied by -nodefconfig.  Likewise, I would
>>> prefer that -nodefaults implied -vga none.
>>
>> I don't have a qemu.cfg so can't comment on it, but in what way does
>> reading target-x86_64.cfg affect the current instance (that is, why is
>> -nodefconfig needed over -nodefaults -vga look-at-the-previous-option?)
>
> It depends on what the user configures it to do.

How?

As far as I can tell, the only difference is that -nodefconfig -cpu
westmere will error out instead of working.  But if

Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt

2012-03-25 Thread Avi Kivity

On 03/25/2012 03:22 PM, Anthony Liguori wrote:
 In that case

qemu -cpu westmere

 is shorthand for -readconfig /usr/share/qemu/cpus/westmere.cfg.
>>>
>>>
>>> This is not a bad suggestion, although it would make -cpu ? a bit
>>> awkward.  Do you see an advantage to this over having
>>> /usr/share/qemu/target-x86_64-cpus.cfg that's read early on?
>>
>> Nope.  As long as qemu -nodefconfig -cpu westmere works, I'm happy.
>
>
> Why?  What's wrong with:
>
> qemu -nodefconfig -readconfig
> /usr/share/qemu/cpus/target-x86_64-cpus.cfg \
>  -cpu westmere
>
> And if that's not okay, would:
>
> qemu -nodefconfig -nocpudefconfig -cpu Westmere
>
> Not working be a problem?

Apart from the command line length, it confuses configuration with
definition.

target-x86_64-cpus.cfg does not configure qemu for anything, it's merely
the equivalent of

  #define westmere (x86_def_t) { ... }
  #define nehalem (x86_def_t) { ... }
  #define bulldozer (x86_def_t) { ... } // for PC

so it should be read at each invocation.  On the other hand, pc.cfg and
westmere.cfg (as used previously) are shorthand for

   machine = (QEMUMachine) { ... };
   cpu = (x86_def_t) { ... };

so they should only be read if requested explicitly (or indirectly).

>
>> The reasoning is, loading target-x86_64-cpus.cfg does not alter the
>> current instance's configuration, so reading it doesn't violate
>> -nodefconfig.
>
> I think we have a different view of what -nodefconfig does.
>
> We have a couple options today:
>
> -nodefconfig
>
> Don't read the default configuration files.  By default, we read
> /etc/qemu/qemu.cfg and /etc/qemu/target-$(ARCH).cfg
>

The latter seems meaningless to avoid reading.  It's just a set of
#defines, what do you get by not reading it?

> -nodefaults
>
> Don't create default devices.
>
> -vga none
>
> Don't create the default VGA device (not covered by -nodefaults).
>
> With these two options, the semantics you get an absolutely
> minimalistic instance of QEMU.  Tools like libguestfs really want to
> create the simplest guest and do the least amount of processing so the
> guest runs as fast as possible.
>
> It does suck a lot that this isn't a single option.  I would much
> prefer -nodefaults to be implied by -nodefconfig.  Likewise, I would
> prefer that -nodefaults implied -vga none.

I don't have a qemu.cfg so can't comment on it, but in what way does
reading target-x86_64.cfg affect the current instance (that is, why is
-nodefconfig needed over -nodefaults -vga look-at-the-previous-option?)

>
> files be read by default or just treated as additional configuration
> files.

 If they're read as soon as they're referenced, what's the difference?
>>> I think the thread has reduced to: should /usr/share configuration
>>>
>>> I suspect libvirt would not be happy with reading configuration files
>>> on demand..
>>
>> Why not?
>
> It implies a bunch of SELinux labeling to make sVirt work.  libvirt
> tries very hard to avoid having QEMU read *any* files at all when it
> starts up.

The /usr/share/qemu files should be statically labelled to allow qemu to
read them, so we can push more code into data files.

-- 
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt

2012-03-25 Thread Avi Kivity

On 03/11/2012 04:12 PM, Anthony Liguori wrote:
>> Let me elaborate about the later. Suppose host CPU has kill_guest
>> feature and at the time a guest was installed it was not implemented by
>> kvm. Since it was not implemented by kvm it was not present in vcpu
>> during installation and the guest didn't install "workaround kill_guest"
>> module. Now unsuspecting user upgrades the kernel and tries to restart
>> the guest and fails. He writes angry letter to qemu-devel and is
>> asked to
>> reinstall his guest and move along.
>
>
> -cpu best wouldn't solve this.  You need a read/write configuration
> file where QEMU probes the available CPU and records it to be used for
> the lifetime of the VM.

This doesn't work with live migration, and makes templating harder.  The
only persistent storage we can count on are disk images.

The current approach is simple.  The management tool determines the
configuration, qemu applies it.  Unidirectional information flow.  This
also lends itself to the management tool scanning a cluster and
determining a GCD.

> This discussion isn't about whether QEMU should have a Westmere
> processor definition.  In fact, I think I already applied that patch.
>
> It's a discussion about how we handle this up and down the stack.
>
> The question is who should define and manage CPU compatibility.  Right
> now QEMU does to a certain degree, libvirt discards this and does it's
> own thing, and VDSM/ovirt-engine assume that we're providing something
> and has built a UI around it.
>
> What I'm proposing we consider: have VDSM manage CPU definitions in
> order to provide a specific user experience in ovirt-engine.
>
> We would continue to have Westmere/etc in QEMU exposed as part of the
> user configuration.  But I don't think it makes a lot of sense to have
> to modify QEMU any time a new CPU comes out.

We have to.  New features often come with new MSRs which need to be live
migrated, and of course the cpu flags as well.  We may push all these to
qemu data files, but this is still qemu.  We can't let a management tool
decide that cpu feature X is safe to use on qemu version Y.


-- 
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt

2012-03-25 Thread Avi Kivity

On 03/25/2012 03:12 PM, Anthony Liguori wrote:
>>> qemu -M pc
>>>
>>> Would effectively be short hand for -readconfig
>>> /usr/share/qemu/machines/pc.cfg
>>
>> In that case
>>
>>   qemu -cpu westmere
>>
>> is shorthand for -readconfig /usr/share/qemu/cpus/westmere.cfg.
>
>
> This is not a bad suggestion, although it would make -cpu ? a bit
> awkward.  Do you see an advantage to this over having
> /usr/share/qemu/target-x86_64-cpus.cfg that's read early on?

Nope.  As long as qemu -nodefconfig -cpu westmere works, I'm happy.

The reasoning is, loading target-x86_64-cpus.cfg does not alter the
current instance's configuration, so reading it doesn't violate
-nodefconfig.

>>> files be read by default or just treated as additional configuration
>>> files.
>>
>> If they're read as soon as they're referenced, what's the difference?
> I think the thread has reduced to: should /usr/share configuration
>
> I suspect libvirt would not be happy with reading configuration files
> on demand..

Why not?

-- 
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Modern CPU models cannot be used with libvirt

2012-03-25 Thread Avi Kivity

On 03/25/2012 02:55 PM, Anthony Liguori wrote:
>> If cpu models are not part of configuration they should not be affected
>> by configuration mechanism. You are just avoiding addressing the real
>> question that if asked above.
>
>
> I think you're just refusing to listen.
>
> The stated direction of QEMU, for literally years now, is that we want
> to arrive at the following:
>
> QEMU is composed of a series of objects who's relationships can be
> fully described by an external configuration file.  Much of the
> current baked in concepts (like machines) would then become
> configuration files.
>
> qemu -M pc
>
> Would effectively be short hand for -readconfig
> /usr/share/qemu/machines/pc.cfg

In that case

 qemu -cpu westmere

is shorthand for -readconfig /usr/share/qemu/cpus/westmere.cfg.

> I think the thread has reduced to: should /usr/share configuration
> files be read by default or just treated as additional configuration
> files.

If they're read as soon as they're referenced, what's the difference?

-- 
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions

2011-11-15 Thread Avi Kivity

On 11/14/2011 11:58 AM, Kevin Wolf wrote:
> Am 12.11.2011 11:25, schrieb Avi Kivity:
> > On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> >> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> >>> Live migration with qcow2 or any other image format is just not going to 
> >>> work 
> >>> right now even with proper clustered storage.  I think doing a block 
> >>> level flush 
> >>> cache interface and letting block devices decide how to do it is the best 
> >>> approach.
> >>
> >> I would really prefer reusing the existing open/close code. It means
> >> less (duplicated) code, is existing code that is well tested and doesn't
> >> make migration much of a special case.
> >>
> >> If you want to avoid reopening the file on the OS level, we can reopen
> >> only the topmost layer (i.e. the format, but not the protocol) for now
> >> and in 1.1 we can use bdrv_reopen().
> > 
> > Intuitively I dislike _reopen style interfaces.  If the second open
> > yields different results from the first, does it invalidate any
> > computations in between?
>
> Not sure what results and what computation you mean,

Result = open succeeded.  Computation = anything that derives from the
image, like size, or reading some stuff to guess CHS or something.

>  but let me clarify
> a bit about bdrv_reopen:
>
> The main purpose of bdrv_reopen() is to change flags, for example toggle
> O_SYNC during runtime in order to allow the guest to toggle WCE. This
> doesn't necessarily mean a close()/open() sequence if there are other
> means to change the flags, like fcntl() (or even using other protocols
> than files).
>
> The idea here was to extend this to invalidate all caches if some
> specific flag is set. As you don't change any other flag, this will
> usually not be a reopen on a lower level.
>
> If we need to use open() though, and it fails (this is really the only
> "different" result that comes to mind)

(yes)

>  then bdrv_reopen() would fail and
> the old fd would stay in use. Migration would have to fail, but I don't
> think this case is ever needed for reopening after migration.

Okay.

>
> > What's wrong with just delaying the open?
>
> Nothing, except that with today's code it's harder to do.
>

This has never stopped us (though it may delay us).

-- 
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions

2011-11-12 Thread Avi Kivity

On 11/12/2011 03:39 PM, Anthony Liguori wrote:
> On 11/12/2011 04:27 AM, Avi Kivity wrote:
>> On 11/11/2011 04:03 PM, Anthony Liguori wrote:
>>>
>>> I don't view not supporting migration with image formats as a
>>> regression as it's never been a feature we've supported.  While there
>>> might be confusion about support around NFS, I think it's always been
>>> clear that image formats cannot be used.
>>
>> Was there ever a statement to that effect?  It was never clear to me and
>> I doubt it was clear to anyone.
>
> You literally reviewed a patch who's subject was "block: allow
> migration to work with image files"[1] that explained in gory detail
> what the problem was.
>
> [1] http://mid.gmane.org/4c8cad7c.5020...@redhat.com
>

Isn't a patch fixing a problem with migrating image files a statement
that we do support migrating image files?


>>
>>>
>>> Given that, I don't think this is a candidate for 1.0.
>>>
>>
>> Let's just skip 1.0 and do 1.1 instead.
>
> Let's stop being overly dramatic.  You know as well as anyone that
> image format support up until the coroutine conversion has had enough
> problems that no one could practically be using them in a production
> environment.

They are used in production environments.

>
> Live migration is an availability feature.  Up until the 1.0 release,
> if you cared about availability and correctness, you would not be
> using an image format.
>

Nevertheless, people who care about both availability and correctness,
do use image formats.  In reality, migration and image formats are
critical features for virtualization workloads.  Pretending they're not
makes the 1.0 release a joke.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions

2011-11-12 Thread Avi Kivity

On 11/11/2011 12:15 PM, Kevin Wolf wrote:
> Am 10.11.2011 22:30, schrieb Anthony Liguori:
> > Live migration with qcow2 or any other image format is just not going to 
> > work 
> > right now even with proper clustered storage.  I think doing a block level 
> > flush 
> > cache interface and letting block devices decide how to do it is the best 
> > approach.
>
> I would really prefer reusing the existing open/close code. It means
> less (duplicated) code, is existing code that is well tested and doesn't
> make migration much of a special case.
>
> If you want to avoid reopening the file on the OS level, we can reopen
> only the topmost layer (i.e. the format, but not the protocol) for now
> and in 1.1 we can use bdrv_reopen().
>

Intuitively I dislike _reopen style interfaces.  If the second open
yields different results from the first, does it invalidate any
computations in between?

What's wrong with just delaying the open?

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions

2011-11-12 Thread Avi Kivity

On 11/11/2011 04:03 PM, Anthony Liguori wrote:
>
> I don't view not supporting migration with image formats as a
> regression as it's never been a feature we've supported.  While there
> might be confusion about support around NFS, I think it's always been
> clear that image formats cannot be used.

Was there ever a statement to that effect?  It was never clear to me and
I doubt it was clear to anyone.

>
> Given that, I don't think this is a candidate for 1.0.
>

Let's just skip 1.0 and do 1.1 instead.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v4] XBZRLE delta for live migration of large memory apps

2011-08-11 Thread Avi Kivity


On 08/11/2011 12:16 PM, Daniel P. Berrange wrote:

On Thu, Aug 11, 2011 at 11:17:09AM +0300, Avi Kivity wrote:
>  On 08/10/2011 10:27 PM, Anthony Liguori wrote:
>  >>This may be acceptable, wait until the entire migration cluster is
>  >>xzbrle capable before enabling it. If not, add a monitor command.
>  >
>  >
>  >1) xzbrle needs to be disabled by default.  That way management
>  >tools don't unknowingly enable it by not passing -no-xzbrle.
>
>  We could hook it to -M, though it's a bit gross.

That would needlessly prevent its use for any existing installed
guests with a older machine type, which are running in a new QEMU


You could still enable it explicitly; I'm just trying to get it to be 
enabled by default.



Some kind of monitor capabilities seems good to me.



Live migration is probably mostly done in managed environments, so I 
think you're right.


--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v4] XBZRLE delta for live migration of large memory apps

2011-08-11 Thread Avi Kivity


On 08/10/2011 10:27 PM, Anthony Liguori wrote:

This may be acceptable, wait until the entire migration cluster is
xzbrle capable before enabling it. If not, add a monitor command.



1) xzbrle needs to be disabled by default.  That way management tools 
don't unknowingly enable it by not passing -no-xzbrle.


We could hook it to -M, though it's a bit gross.  Otherwise we need to 
document this clearly in the management tool author's guide.




3) a management tool should be able to query the source and 
destination, and then enable xzbrle if both sides support it.


You can argue that (3) could be static.  A command could be added to 
toggle it dynamically through the monitor.


But no matter what, someone has to touch libvirt and any other tool 
that works with QEMU to make this thing work.  But this is a general 
problem.  Any optional change to the migration protocol has exactly 
the same characteristics whether it's XZBRLE, XZBRLE v2 (if there is a 
v2), ASN.1, or any other form of compression that rolls around.


If we have two-way communication we can do this transparently in the 
protocol itself.




Instead of teaching management tools how to deal with all of these 
things, let's just fix this problem once.  It just takes:


a) A query-migration-caps command that returns a dict with two lists 
of strings.  Something like:


{ 'execute': 'query-migration-caps' }
{ 'return' : { 'capabilities': [ 'xbzrle' ], 'current': [] } }

b) A set-migration-caps command that takes a list of strings.  It 
simply takes the intersection of the capabilities set with the 
argument and sets the current set to the result.  Something like:


{ 'execute': 'set-migration-caps', 'arguments': { 'set': [ 'xbzrle' ] }}
{ 'return' : {} }

c) An internal interface to register a capability and an internal 
interface to check if a capability is currently enabled.  The xzbrle 
code just needs to disable itself if the capability isn't set.


Then we teach libvirt (and other tools) to query the caps list on the 
source, set the destination, query the current set on the destination, 
and then set that set on the source.


This is only if the capability has no side effect.



As we introduce new things, like the next great compression protocol, 
or ASN.1, we don't need to touch libvirt again.  libvirt can still 
know about the caps and selectively override QEMU if it's so inclined 
but it prevents us from reinventing the same mechanisms over and over 
again.


Right.



Yes.  But that negotiation needs to become part of the "protocol" for 
migration.  In the absence of that negotiation, we need to use the 
wire protocol we use today.  We cannot have ad-hoc feature negotiation 
for every change we make to the wire protocol.


Okay, as long as we have someone willing to implement it.

--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v4] XBZRLE delta for live migration of large memory apps

2011-08-08 Thread Avi Kivity


On 08/08/2011 05:04 PM, Daniel P. Berrange wrote:

My main concern with all these scenarios where libvirt touches the
actual data stream though is that we're introducing extra data copies
into the migration path which potentially waste CPU cycles.
If QEMU can directly XBZRLE encode data into the FD passed via 'fd:'
then we minimize data copies. Whether this is a big enough benefit
to offset the burden of having to maintain various compression code
options in QEMU I can't answer.



It's counterproductive to force an unneeded data copy in order to 
increase bandwidth.


--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v4] XBZRLE delta for live migration of large memory apps

2011-08-08 Thread Avi Kivity


On 08/08/2011 05:33 PM, Anthony Liguori wrote:

If we have a shared object helper, the thread should be maintained by
qemu proper, not the plugin.

I wouldn't call it "migration transport", but instead a
compression/decompression plugin.

I don't think it merits a plugin at all though. There's limited scope
for compression and it best sits in qemu proper. If anything, it needs
to be more integrated (for example turning itself off if it doesn't
match enough).



That adds a tremendous amount of complexity to QEMU. 


Tremendous?  You exaggerate.  It's a lot simpler than the block or char 
layers, for example.


If we're going to change our compression algorithm, we would need to 
use a single algorithm that worked well for a wide variety of workloads.


That algorithm will have to include XBZRLE as a subset, since it matches 
what workloads actually do (touch memory sparsely).




We struggle enough with migration as it is, it only would get worse if 
we have 10 different algorithms that we were dynamically 
enabling/disabling.


The other option is to allow 1-off compression algorithms in the form 
of plugins.  I think in this case, plugins are a pretty good 
compromise in terms of isolating complexity while allowing something 
that at least works very well for one particular type of workload.


I think you underestimate the generality of XBZRLE (or maybe I'm 
overestimating it?).  It's not reasonable to ask users to match a 
compression algorithm to their workload; most times they won't be 
interacting with the host at all.  We need compression to be enabled at 
all time, turning itself off if it finds it isn't effective so it can 
consume less cpu.


--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v4] XBZRLE delta for live migration of large memory apps

2011-08-08 Thread Avi Kivity


On 08/08/2011 05:15 PM, Anthony Liguori wrote:




I think workload aware migration compression is possible for a lot of
different types of workloads. That makes me a bit wary of QEMU growing
quite a lot of compression mechanisms.

It makes me think that this logic may really belong at a higher level
where more information is known about the workload. For instance, I
can imagine XBZRLE living in something like libvirt.


A better model would be plugin based.



exec helpers are plugins.  They just live in a different address space 
and a channel to exchange data (pipe).


libvirt isn't an exec helper.



If we did .so plugins, which I'm really not opposed to, I'd want the 
interface to be something like:


typedef struct MigrationTransportClass
{
   ssize_t (*writev)(MigrationTransport *obj,
 struct iovec *iov,
 int iovcnt);
} MigrationTransportClass;

I think it's useful to use an interface like this because it makes it 
easy to put the transport in a dedicated thread that didn't hold 
qemu_mutex (which is sort of equivalent to using a fork'd helper but 
is zero-copy at the expense of less isolation).


If we have a shared object helper, the thread should be maintained by 
qemu proper, not the plugin.


I wouldn't call it "migration transport", but instead a 
compression/decompression plugin.


I don't think it merits a plugin at all though.  There's limited scope 
for compression and it best sits in qemu proper.  If anything, it needs 
to be more integrated (for example turning itself off if it doesn't 
match enough).


--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v4] XBZRLE delta for live migration of large memory apps

2011-08-08 Thread Avi Kivity


On 08/08/2011 04:41 PM, Alexander Graf wrote:

In general, I believe it's a good idea to keep looking at libvirt as a vm 
management layer and only a vm management layer.


Very much yes.

--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v4] XBZRLE delta for live migration of large memory apps

2011-08-08 Thread Avi Kivity


On 08/08/2011 04:29 PM, Anthony Liguori wrote:


One thing that strikes me about this algorithm is that it's very good 
for a particular type of workload--shockingly good really.


Poking bytes at random places in memory is fairly generic.  If you have 
a lot of small objects, and modify a subset of them, this is the pattern 
you get.




I think workload aware migration compression is possible for a lot of 
different types of workloads.  That makes me a bit wary of QEMU 
growing quite a lot of compression mechanisms.


It makes me think that this logic may really belong at a higher level 
where more information is known about the workload.  For instance, I 
can imagine XBZRLE living in something like libvirt.


A better model would be plugin based.

--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] live snapshot wiki updated

2011-07-22 Thread Avi Kivity

On 07/20/2011 04:51 PM, Kevin Wolf wrote:

>
>  The problem is that QEMU will find backing file file names inside the
>  images which it will be unable to open. How do you suggest we get around
>  that?

This is the part with allowing libvirt to override the backing file. Of
course, this is not something that we can add with five lines of code,
it requires -blockdev.

It can be done without blockdev.  Have a dictionary that translates 
filenames, and populate it from the command line (for a bonus, translate 
a filename to a file descriptor inherited from the caller or passed via 
the monitor).

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v2] Add support for fd: protocol

2011-06-21 Thread Avi Kivity


On 06/20/2011 10:11 PM, Anthony Liguori wrote:

It would need careful explanation in the management tool author's guide,
yes.

The main advantage is generality. It doesn't assume that a file format
has just one backing file, and doesn't require new syntax wherever a
file is referred to indirectly.



FWIW, with blockdev, we need options to control this all anyway.  If 
you go back to my QCFG proposal, the parameters would actually be 
format specific, so if we had:


-block 
file=fd:4,format=fancypantsformat,part0=hd0-back.part1,part1=hd0-back.part2...


Yeah.  We either name the formal argument (your proposal) or the actual 
argument (mine).


--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v2] Add support for fd: protocol

2011-06-20 Thread Avi Kivity


On 06/20/2011 04:50 PM, Anthony Liguori wrote:

On 06/20/2011 08:40 AM, Avi Kivity wrote:

On 06/14/2011 04:31 PM, Corey Bryant wrote:

- Starting Qemu with a backing file



For this we could tell qemu that a file named "xyz" is available via fd
n, via an extension of the getfd command.

For example

(qemu) getfd path="/images/my-image.img"
(qemu) getfd path="/images/template.img"
(qemu) drive-add path="/images/my-image.img"

The open() for my-image.img first looks up the name in the getfd
database, and finds it, so it returns the fd from there instead of
opening. It then opens the backing file ("template.img") and looks it up
again, and finds the second fd from the session.


The way I've been thinking about this is:

 -blockdev id=hd0-back,file=fd:4,format=raw \
 -blockdev file=fd:3,format=qcow2,backing=hd0-back

While your proposal is clever, it makes me a little nervous about 
subtle security ramifications.


It would need careful explanation in the management tool author's guide, 
yes.


The main advantage is generality.  It doesn't assume that a file format 
has just one backing file, and doesn't require new syntax wherever a 
file is referred to indirectly.


--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v2] Add support for fd: protocol

2011-06-20 Thread Avi Kivity


On 06/14/2011 04:31 PM, Corey Bryant wrote:

   - Starting Qemu with a backing file



For this we could tell qemu that a file named "xyz" is available via fd 
n, via an extension of the getfd command.


For example

  (qemu) getfd path="/images/my-image.img"
  (qemu) getfd path="/images/template.img"
  (qemu) drive-add path="/images/my-image.img"

The open() for my-image.img first looks up the name in the getfd 
database, and finds it, so it returns the fd from there instead of 
opening.  It then opens the backing file ("template.img") and looks it 
up again, and finds the second fd from the session.


The result is that open()s are satisfied from the monitor, instead of 
the host kernel, but without reversing the request/reply nature of the 
monitor protocol.


A similar extension could be added to the command line:

  qemu -drive file=fd:4,cache=none -path-alias 
name=/images/template.img,path=fd:5


Here the main image is opened via a fd 4; if it needs template.img, it 
gets shunted to fd 5.


--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v2 3/3] raw-posix: Re-open host CD-ROM after media change

2011-04-05 Thread Avi Kivity

On 04/05/2011 12:12 PM, Amit Shah wrote:

On (Tue) 05 Apr 2011 [12:00:38], Avi Kivity wrote:
>  On 04/05/2011 11:09 AM, Amit Shah wrote:
>  >On (Tue) 05 Apr 2011 [10:48:16], Avi Kivity wrote:
>  >>   On 04/05/2011 09:41 AM, Amit Shah wrote:
>  >>   >See http://www.spinics.net/lists/linux-scsi/msg51504.html
>  >>
>  >>   I see this is quite fresh.  What are the plans here?
>  >
>  >We're still discussing where the fix should be, but it certainly is a
>  >kernel bug and should be fixed there, and then applied to stable.
>  >
>  >However, there are other bugs in qemu which will prevent the right
>  >size changes to be visible in the guest (the RFC series I sent out
>  >earlier in this thread need to be applied to QEMU at the least, the
>  >series has grown in my development tree since the time I sent that one
>  >out).  So essentially we need to update both, the hypervisor and the
>  >guest to get proper CDROM media change support.
>
>  Why do we need to update the guest for a qemu bug?  What is the qemu bug?

Guest kernel bug: CDROM change event missed, so the the revalidate
call isn't made, which causes stale data (like disc size) to be used
on newer media.

qemu bug: We don't handle the GET_EVENT_STATUS_NOTIFICATION command
from guests (which is a mandatory command acc. to scsi spec) which the
guest uses to detect CDROM changes.  Once this command is implemented,
QEMU sends the required info the guest needs to detect CDROM changes.
I have this implemented locally (also sent as RFC PATCH 2/3 in the
'cdrom bug roundup' thread.

So: even if qemu is updated to handle this command, the guest won't
work correctly since it misses the event.

Okay.  We aren't responsible for guest kernel bugs, especially those 
which apply to real hardware (we should make more effort for virtio 
bugs).  It's enough that we fix qemu here.

>  >It also looks like we can't have a workaround in QEMU to get older
>  >guests to work.
>
>  Older guests?  or older hosts?

Older guests (not patched with fix for the bug described above).

Since the guest kernel completely misses the disc change event in the
path that does the revalidation, there's nothing qemu can do that will
make such older guests notice disc change.

Also: if only the guest kernel is updated by qemu is not, things still
won't work since qemu will never send valid information for the
GET_EVENT_STATUS_NOTIFICATION command.

>  >However, a hack in the kernel can be used without any QEMU changes
>  >(revalidate disk on each sr_open() call, irrespective of detecting any
>  >media change).  I'm against doing that for upstream, but downstreams
>  >could do that for new guest - old hypervisor compat.
>
>  Seriously confused.  Please use the kernels "host kernel" and "qemu"
>  instead of "hypervisor" which is ambiguous.

OK: this last bit says that forcefully revalidating discs in the guest
kernel when a guest userspace opens the disc will ensure size changes
are reflected properly for guest userspace.  So in this case, even if
we're using an older qemu which doesn't implement
GET_EVENT_STATUS_NOTIFICATION, guest userspace apps will work fine.

This is obviously a hack.

Yes.  Thanks for the clarification.

(let's see if I really got it - we have a kernel bug that hit both the 
guest and the host, plus a qemu bug?)

--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v2 3/3] raw-posix: Re-open host CD-ROM after media change

2011-04-05 Thread Avi Kivity

On 04/05/2011 11:09 AM, Amit Shah wrote:

On (Tue) 05 Apr 2011 [10:48:16], Avi Kivity wrote:
>  On 04/05/2011 09:41 AM, Amit Shah wrote:
>  >See http://www.spinics.net/lists/linux-scsi/msg51504.html
>
>  I see this is quite fresh.  What are the plans here?

We're still discussing where the fix should be, but it certainly is a
kernel bug and should be fixed there, and then applied to stable.

However, there are other bugs in qemu which will prevent the right
size changes to be visible in the guest (the RFC series I sent out
earlier in this thread need to be applied to QEMU at the least, the
series has grown in my development tree since the time I sent that one
out).  So essentially we need to update both, the hypervisor and the
guest to get proper CDROM media change support.

Why do we need to update the guest for a qemu bug?  What is the qemu bug?

It also looks like we can't have a workaround in QEMU to get older
guests to work.

Older guests?  or older hosts?

However, a hack in the kernel can be used without any QEMU changes
(revalidate disk on each sr_open() call, irrespective of detecting any
media change).  I'm against doing that for upstream, but downstreams
could do that for new guest - old hypervisor compat.

Seriously confused.  Please use the kernels "host kernel" and "qemu" 
instead of "hypervisor" which is ambiguous.

--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v2 3/3] raw-posix: Re-open host CD-ROM after media change

2011-04-05 Thread Avi Kivity


On 04/05/2011 09:41 AM, Amit Shah wrote:

See http://www.spinics.net/lists/linux-scsi/msg51504.html



I see this is quite fresh.  What are the plans here?

--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v2 3/3] raw-posix: Re-open host CD-ROM after media change

2011-04-04 Thread Avi Kivity


On 04/04/2011 06:09 PM, Stefan Hajnoczi wrote:

On Mon, Apr 4, 2011 at 2:49 PM, Avi Kivity  wrote:
>  On 04/04/2011 04:38 PM, Anthony Liguori wrote:
>>
>>  On 04/04/2011 08:22 AM, Avi Kivity wrote:
>>>
>>>  On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote:
>>>>
>>>>  In order for media change to work with Linux host CD-ROM it is
>>>>  necessary to reopen the file (otherwise the inode size will not
>>>>  refresh, this is an issue with existing kernels).
>>>>
>>>
>>>  Maybe we should fix the bug in Linux (and backport as necessary)?
>>>
>>>  I think cd-rom assignment is sufficiently obscure that we can require a
>>>  fixed kernel instead of providing a workaround.
>>
>>  Do reads fail after CD change?  Or do they succeed and the size is just
>>  reported incorrectly?
>>
>>  If it's the later, I'd agree that it needs fixing in the kernel.  If it's
>>  the former, I'd say it's clearly a feature.
>>
>
>  Even if it's a documented or intentional feature, we can add an ioctl to
>  "refresh" the device with up-to-date data.

It's possible to fix this in the kernel.  I just haven't written the
patch yet.  The inode size needs to be updated when the new medium is
detected.

I haven't tested but I suspect reads within the size of the previous
medium will succeed.  But if the new medium is larger then reads
beyond the old medium size will fail.

The size reported by lseek(fd, 0, SEEK_END) is outdated.


I believe a kernel fix is best in that case, leaving qemu alone.

--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v2 3/3] raw-posix: Re-open host CD-ROM after media change

2011-04-04 Thread Avi Kivity


On 04/04/2011 04:38 PM, Anthony Liguori wrote:

On 04/04/2011 08:22 AM, Avi Kivity wrote:

On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote:

In order for media change to work with Linux host CD-ROM it is
necessary to reopen the file (otherwise the inode size will not
refresh, this is an issue with existing kernels).



Maybe we should fix the bug in Linux (and backport as necessary)?

I think cd-rom assignment is sufficiently obscure that we can require 
a fixed kernel instead of providing a workaround.


Do reads fail after CD change?  Or do they succeed and the size is 
just reported incorrectly?


If it's the later, I'd agree that it needs fixing in the kernel.  If 
it's the former, I'd say it's clearly a feature.




Even if it's a documented or intentional feature, we can add an ioctl to 
"refresh" the device with up-to-date data.


--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [PATCH v2 3/3] raw-posix: Re-open host CD-ROM after media change

2011-04-04 Thread Avi Kivity


On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote:

In order for media change to work with Linux host CD-ROM it is
necessary to reopen the file (otherwise the inode size will not
refresh, this is an issue with existing kernels).



Maybe we should fix the bug in Linux (and backport as necessary)?

I think cd-rom assignment is sufficiently obscure that we can require a 
fixed kernel instead of providing a workaround.


--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Startup/Shutdown scripts for KVM Machines in Debian (libvirt)

2010-11-10 Thread Avi Kivity


On 11/10/2010 10:01 AM, Hermann Himmelbauer wrote:

Hi,
I manage my KVM machines via libvirt and wonder if there are any init.d
scripts for automatically starting up and shutting down virtual machines
during boot/shutdown of the host?

Writing this for myself seems to be not that simple, as when shutting down,
the system has somehow to wait until all machines are halted (not responding
guests have to be destroyed etc.), and I don't really know how to accomplish
this.

My host system is Debian Lenny, is there anything available?
Perhaps libvirt offers something I'm unaware of?



I think it does.  Copying the libvirt mailing list.

--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration

2010-09-12 Thread Avi Kivity

On 09/12/2010 07:19 PM, Anthony Liguori wrote:

On 09/12/2010 11:45 AM, Avi Kivity wrote:

Streaming relies on copy-on-read to do the writing.

Ah. You can avoid the copy-on-read implementation in the block
format driver and do it completely in generic code.

Copy on read takes advantage of temporal locality. You wouldn't want
to stream without copy on read because you decrease your idle I/O time
by not effectively caching.

I meant, implement copy-on-read in generic code side by side with
streaming. Streaming becomes just a prefetch operation (read and
discard) which lets copy-on-read do the rest. This is essentially your
implementation, yes?

stream_4():
increment offset
if more:
bdrv_aio_stream()

Of course, need to serialize wrt guest writes, which adds a bit
more complexity. I'll leave it to you to code the state machine
for that.

http://repo.or.cz/w/qemu/aliguori.git/commitdiff/d44ea43be084cc879cd1a33e1a04a105f4cb7637?hp=34ed425e7dd39c511bc247d1ab900e19b8c74a5d

Clever - it pushes all the synchronization into the copy-on-read
implementation. But the serialization there hardly jumps out of the
code.

Do I understand correctly that you can only have one allocating read
or write running?

Cluster allocation, L2 cache allocation, or on-disk L2 allocation?

You only have one on-disk L2 allocation at one time. That's just an
implementation detail at the moment. An on-disk L2 allocation happens
only when writing to a new cluster that requires a totally new L2
entry. Since L2s cover 2GB of logical space, it's a rare event so
this turns out to be pretty reasonable for a first implementation.

Parallel on-disk L2 allocations is not that difficult, it's just a
future TODO.

Really, you can just preallocate all L2s. Most filesystems will touch
all of them very soon. qcow2 might save some space for snapshots which
share L2s (doubtful) or for 4k clusters (historical) but for qed with
64k clusters, it doesn't save any space.

Linear L2s will also make your fsck *much* quicker. Size is .01% of
logical image size. 1MB for a 10GB guest, by the time you install
something on it that's a drop in the bucket.

If you install a guest on a 100GB disk, what percentage of L2s are
allocated?

Generally, I think the block layer makes more sense if the interface
to the formats are high level and code sharing is achieved not by
mandating a world view but rather but making libraries of common
functionality. This is more akin to how the FS layer works in Linux.

So IMHO, we ought to add a bdrv_aio_commit function, turn the
current code into a generic_aio_commit, implement a qed_aio_commit,
then somehow do qcow2_aio_commit, and look at what we can refactor
into common code.

What Linux does if have an equivalent of bdrv_generic_aio_commit()
which most implementations call (or default to), and only do
something if they want something special. Something like commit (or
copy-on-read, or copy-on-write, or streaming) can be implement 100%
in terms of the generic functions (and indeed qcow2 backing files can
be any format).

Yes, what I'm really saying is that we should take the
bdrv_generic_aio_commit() approach. I think we're in agreement here.

Strange feeling.

--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration

2010-09-12 Thread Avi Kivity

On 09/12/2010 05:23 PM, Anthony Liguori wrote:

On 09/12/2010 08:40 AM, Avi Kivity wrote:
Why would it serialize all I/O operations? It's just like another
vcpu issuing reads.

Because the block layer isn't re-entrant.

A threaded block layer is reentrant. Of course pushing the thing into a
thread requires that.

What you basically do is:

stream_step_three():
complete()

stream_step_two(offset, length):
bdrv_aio_readv(offset, length, buffer, stream_step_three)

bdrv_aio_stream():
bdrv_aio_find_free_cluster(stream_step_two)

Isn't there a write() missing somewhere?

Streaming relies on copy-on-read to do the writing.

Ah. You can avoid the copy-on-read implementation in the block format
driver and do it completely in generic code.

And that's exactly what the current code looks like. The only
change to the patch that this does is make some of qed's internals
be block layer interfaces.

Why do you need find_free_cluster()? That's a physical offset
thing. Just write to the same logical offset.

IOW:

bdrv_aio_stream():
bdrv_aio_read(offset, stream_2)

It's an optimization. If you've got a fully missing L1 entry, then
you're going to memset() 2GB worth of zeros. That's just wasted
work. With a 1TB image with a 1GB allocation, it's a huge amount of
wasted work.

Ok. And it's a logical offset, not physical as I thought, which
confused me.

stream_2():
if all zeros:
increment offset
if more:
bdrv_aio_stream()
bdrv_aio_write(offset, stream_3)

stream_3():
bdrv_aio_write(offset, stream_4)

I don't understand why stream_3() is needed.

This implementation doesn't rely on copy-on-read code in the block
format driver. It is generic and uses existing block layer interfaces.
It would need copy-on-read support in the generic block layer as well.

stream_4():
increment offset
if more:
bdrv_aio_stream()

Of course, need to serialize wrt guest writes, which adds a bit more
complexity. I'll leave it to you to code the state machine for that.

http://repo.or.cz/w/qemu/aliguori.git/commitdiff/d44ea43be084cc879cd1a33e1a04a105f4cb7637?hp=34ed425e7dd39c511bc247d1ab900e19b8c74a5d

Clever - it pushes all the synchronization into the copy-on-read
implementation. But the serialization there hardly jumps out of the code.

Do I understand correctly that you can only have one allocating read or
write running?

Parts of it are: commit. Of course, that's horribly synchronous.

If you've got AIO internally, making commit work is pretty easy.
Doing asynchronous commit at a generic layer is not easy though unless
you expose lots of details.

I don't see why. Commit is a simple loop that copies all clusters. All
it needs to know is if a cluster is allocated or not.

When commit is running you need additional serialization against guest
writes, and to direct guest writes and reads to the committed region to
the backing file instead of the temporary image. But the block layer
already knows of all guest writes.

So IMHO, we ought to add a bdrv_aio_commit function, turn the current
code into a generic_aio_commit, implement a qed_aio_commit, then
somehow do qcow2_aio_commit, and look at what we can refactor into
common code.

What Linux does if have an equivalent of bdrv_generic_aio_commit() which
most implementations call (or default to), and only do something if they
want something special. Something like commit (or copy-on-read, or
copy-on-write, or streaming) can be implement 100% in terms of the
generic functions (and indeed qcow2 backing files can be any format).

--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration

2010-09-12 Thread Avi Kivity


 On 09/12/2010 03:25 PM, Anthony Liguori wrote:

On 09/12/2010 07:41 AM, Avi Kivity wrote:

 On 09/07/2010 05:57 PM, Anthony Liguori wrote:

I agree that streaming should be generic, like block migration.  The
trivial generic implementation is:

void bdrv_stream(BlockDriverState* bs)
{
 for (sector = 0; sector<  bdrv_getlength(bs); sector += n) {
 if (!bdrv_is_allocated(bs, sector,&n)) {


Three problems here.  First problem is that bdrv_is_allocated is 
synchronous. 


Put the whole thing in a thread.


It doesn't fix anything.  You don't want stream to serialize all I/O 
operations.


Why would it serialize all I/O operations?  It's just like another vcpu 
issuing reads.




The second problem is that streaming makes the most sense when it's 
the smallest useful piece of work whereas bdrv_is_allocated() may 
return a very large range.


You could cap it here but you then need to make sure that cap is at 
least cluster_size to avoid a lot of unnecessary I/O.


That seems like a nice solution.  You probably want a multiple of the 
cluster size to retain efficiency.


What you basically do is:

stream_step_three():
   complete()

stream_step_two(offset, length):
   bdrv_aio_readv(offset, length, buffer, stream_step_three)

bdrv_aio_stream():
bdrv_aio_find_free_cluster(stream_step_two)


Isn't there a write() missing somewhere?



And that's exactly what the current code looks like.  The only change 
to the patch that this does is make some of qed's internals be block 
layer interfaces.


Why do you need find_free_cluster()?  That's a physical offset thing.  
Just write to the same logical offset.


IOW:

bdrv_aio_stream():
bdrv_aio_read(offset, stream_2)

stream_2():
if all zeros:
increment offset
if more:
bdrv_aio_stream()
   bdrv_aio_write(offset, stream_3)

stream_3():
bdrv_aio_write(offset, stream_4)

stream_4():
increment offset
if more:
 bdrv_aio_stream()


Of course, need to serialize wrt guest writes, which adds a bit more 
complexity.  I'll leave it to you to code the state machine for that.




One of the things Stefan has mentioned is that a lot of the QED code 
could be reused by other formats.  All formats implement things like 
CoW on their own today but if you exposed interfaces like 
bdrv_aio_find_free_cluster(), you could actually implement a lot more 
in the generic block layer.


So, I agree with you in principle that this all should be common 
code.  I think it's a larger effort though.


Not that large I think; and it will make commit async as a side effect.



The QED streaming implementation is 140 LOCs too so you quickly end 
up adding more code to the block formats to support these new 
interfaces than it takes to just implement it in the block format.


bdrv_is_allocated() already exists (and is needed for commit), what 
else is needed?  cluster size?


Synchronous implementations are not reusable to implement asynchronous 
anything. 


Surely this is easy to fix, at least for qed.

What we need is thread infrastructure that allows us to convert between 
the two methods.



But you need the code to be cluster aware too.


Yes, another variable in BlockDriverState.



Third problem is that  streaming really requires being able to do 
zero write detection in a meaningful way.  You don't want to always 
do zero write detection so you need another interface to mark a 
specific write as a write that should be checked for zeros.


You can do that in bdrv_stream(), above, before the actual write, and 
call bdrv_unmap() if you detect zeros.


My QED branch now does that FWIW.  At the moment, it only detects zero 
reads to unallocated clusters and writes a special zero cluster 
marker.  However, the detection code is in the generic path so once 
the fsck() logic is working, we can implement a free list in QED.


In QED, the detection code needs to have a lot of knowledge about 
cluster boundaries and the format of the device.  In principle, this 
should be common code but it's not for the same reason copy-on-write 
is not common code today.


Parts of it are: commit.  Of course, that's horribly synchronous.

--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration

2010-09-12 Thread Avi Kivity


 On 09/07/2010 05:57 PM, Anthony Liguori wrote:

I agree that streaming should be generic, like block migration.  The
trivial generic implementation is:

void bdrv_stream(BlockDriverState* bs)
{
 for (sector = 0; sector<  bdrv_getlength(bs); sector += n) {
 if (!bdrv_is_allocated(bs, sector,&n)) {


Three problems here.  First problem is that bdrv_is_allocated is 
synchronous. 


Put the whole thing in a thread.

The second problem is that streaming makes the most sense when it's 
the smallest useful piece of work whereas bdrv_is_allocated() may 
return a very large range.


You could cap it here but you then need to make sure that cap is at 
least cluster_size to avoid a lot of unnecessary I/O.


That seems like a nice solution.  You probably want a multiple of the 
cluster size to retain efficiency.




The QED streaming implementation is 140 LOCs too so you quickly end up 
adding more code to the block formats to support these new interfaces 
than it takes to just implement it in the block format.


bdrv_is_allocated() already exists (and is needed for commit), what else 
is needed?  cluster size?


Third problem is that  streaming really requires being able to do zero 
write detection in a meaningful way.  You don't want to always do zero 
write detection so you need another interface to mark a specific write 
as a write that should be checked for zeros.


You can do that in bdrv_stream(), above, before the actual write, and 
call bdrv_unmap() if you detect zeros.


--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] QEMU interfaces for image streaming and post-copy block migration

2010-09-12 Thread Avi Kivity


 On 09/07/2010 04:41 PM, Anthony Liguori wrote:

Hi,

We've got copy-on-read and image streaming working in QED and before 
going much further, I wanted to bounce some interfaces off of the 
libvirt folks to make sure our final interface makes sense.


Here's the basic idea:

Today, you can create images based on base images that are copy on 
write.  With QED, we also support copy on read which forces a copy 
from the backing image on read requests and write requests.


Is copy on read QED specific?  It looks very similar to the commit 
command, except with I/O directions reversed.


IIRC, commit looks like

  for each sector:
if image.mapped(sector):
backing_image.write(sector, image.read(sector))

whereas copy-on-read looks like:

  def copy_on_read():
set_ioprio(idle)
for each sector:
  if not image.mapped(sector):
  image.write(sector, backing_image.read(sector))
   run_in_thread(copy_on_read)

With appropriate locking.



In additional to copy on read, we introduce a notion of streaming a 
block device which means that we search for an unallocated region of 
the leaf image and force a copy-on-read operation.


The combination of copy-on-read and streaming means that you can start 
a guest based on slow storage (like over the network) and bring in 
blocks on demand while also having a deterministic mechanism to 
complete the transfer.


The interface for copy-on-read is just an option within qemu-img 
create.  Streaming, on the other hand, requires a bit more thought.  
Today, I have a monitor command that does the following:


stream  

Which will try to stream the minimal amount of data for a single I/O 
operation and then return how many sectors were successfully streamed.


The idea about how to drive this interface is a loop like:

offset = 0;
while offset < image_size:
   wait_for_idle_time()
   count = stream(device, offset)
   offset += count



This is way too low level for the management stack.

Have you considered using the idle class I/O priority to implement 
this?  That would allow host-wide prioritization.  Not sure how to do 
cluster-wide, I don't think NFS has the concept of I/O priority.



--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Avi Kivity


 On 08/25/2010 02:42 PM, Daniel P. Berrange wrote:



Is virt-manager able to drive this?  it would be great if you could
drive everything from there.

Yes, it does now, under the menu Edit ->  Host Details ->  Network Interfaces
NetworkManager has also finally learnt to ignore ifcfg-XXX files which
have a BRIDGE= setting in them, so it shouldn't totally trash your guest
bridge networking if you leave NM running.


Cool.  I guess what remains is to get people to unlearn all the previous 
hacks.


(also would be nice to have libvirt talk to NetworkManager instead of 
/etc/sysconfig)


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Avi Kivity


 On 08/25/2010 02:36 PM, Daniel P. Berrange wrote:



Can't libvirt also create a non-NAT bridge?  Looks like it would prevent
a lot of manual work and opportunity for misconfiguration.

Yes, it can on latest Fedora/RHEL6, using the netcf library. This is the
new 'virsh iface-XXX' command set (and equivalent APIs). I've not updated
the docs to cover this functionality yet though. It also does bonding,
and vlans, etc


Great.

Is virt-manager able to drive this?  it would be great if you could 
drive everything from there.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Avi Kivity


 On 08/25/2010 02:15 PM, Daniel P. Berrange wrote:



So it looks like the default config uses the kernel default?  If libvirt
uses an existing bridge I agree it shouldn't hack it, but if it creates
its own can't it use a sensible default?

That is the NAT virtual network. That one *does* default to a forward
delay of 0, but since it is NAT, it is fairly useless for migration
in anycase. If you do 'virsh net-dumpxml default' you should see that
delay='0' was added

The OP was using bridging rather than NAT though, so this XML example
doesn't apply. My comments about libvirt not overriding kenrel policy
for forward delay were WRT full bridging mode, not the NAT mode[1]


Yes, of course.

Can't libvirt also create a non-NAT bridge?  Looks like it would prevent 
a lot of manual work and opportunity for misconfiguration.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Avi Kivity


 On 08/25/2010 01:52 PM, Daniel P. Berrange wrote:



I think libvirt is doing something about this, copying list for further
info.

libvirt doesn't set a policy for this. It provides an API for
configuring host networking, but we don't override the kernel's
forward delay policy, since we don't presume that all bridges
are going to have VMs attached. In any case the API isn't available
for Debian yet, since no one has ported netcf to Debian, so I
assume the OP set bridging up manually. The '15' second default is
actually a kernel level default IIRC.

The two main host network configs recommended for use with libvirt+KVM
(either NAT or bridging) are documented here:

   http://wiki.libvirt.org/page/Networking


From that page:

# virsh net-define /usr/share/libvirt/networks/default.xml

From my copy of that file:


  default
  
  
  

  

  


So it looks like the default config uses the kernel default?  If libvirt 
uses an existing bridge I agree it shouldn't hack it, but if it creates 
its own can't it use a sensible default?



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] KVM doesn't send an arp announce after live migrating a domain

2010-08-25 Thread Avi Kivity


 On 08/25/2010 12:21 PM, Nils Cant wrote:

On 08/25/2010 10:38 AM, Gleb Natapov wrote:
qemu sends gratuitous ARP after migration. Check forward delay 
setting on your

bridge interface. It should be set to zero.



Aha! That fixed it. Turns out that debian bridge-utils sets the 
default to 15 for bridges.
Manually setting it to 0 with 'brctl setfd br0 0' or setting the 
'bridge_fd' parameter to 0 in /etc/network/interfaces solves the issue.




I think libvirt is doing something about this, copying list for further 
info.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: [PATCH] Introduce a -libvirt-caps flag as a stop-gap

2010-07-27 Thread Avi Kivity


 On 07/27/2010 07:38 PM, Anthony Liguori wrote:
I'm going to revert the -help changes for 0.13 so that old versions of 
libvirt work but not for master.


What is the goal here?   Make qemu.git explicitly be unusable via libvirt?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Libvirt debug API

2010-04-26 Thread Avi Kivity


On 04/26/2010 05:48 PM, Anthony Liguori wrote:


  We could easily reuse that.  Any other security context code would 
be custom written; so it can be written as a qemud plugin instead of 
a bit of code that goes before a qemu launch.


I think we're mostly in agreement with respect to the need to have 
more control over the security context the qemu runs in.  Whether it's 
launched via a daemon or directly I think is an implementation detail 
that we can debate when we get closer to an actual implementation.




Good, as I haven't decided yet which side I'm on.

--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Libvirt debug API

2010-04-26 Thread Avi Kivity


On 04/26/2010 05:28 PM, Anthony Liguori wrote:
Or a library that the user-written launcher calls.  Or a plugin that 
qemud calls.



A plugin would lose the security context.  It could attempt to 
recreate it that seems like a lot of unnecessary complexity.




A plugin would create the security context instead of the launcher.

Currently security contexts are created by the login process.  We could 
easily reuse that.  Any other security context code would be custom 
written; so it can be written as a qemud plugin instead of a bit of code 
that goes before a qemu launch.


--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Libvirt debug API

2010-04-26 Thread Avi Kivity


On 04/26/2010 05:25 PM, Chris Lalancette wrote:

Right, and you are probably one of the users this work targets.  But in
general, for those not very familiar with virtualization/qemu, we want
to steer them far clear of this API.  That goes doubly true for application
developers; we want them to be able to use a stable, long-term API and
not have to worry about the nitty-gritty details of the monitor.  It's that
latter group that we want to make sure doesn't use this API.
   


With qmp, we have a stable long term API, and the nitty-gritty details 
are easily hidden behind a stock json parser (unfortunately some rpc 
details remain).  The command line is baroque, but the libvirt xml isn't 
so pretty either.


The problem is a user that starts with libvirt and outgrows its 
featureset.  Do we want them to fall back to qmp?


--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Libvirt debug API

2010-04-26 Thread Avi Kivity


On 04/26/2010 05:19 PM, Anthony Liguori wrote:

On 04/26/2010 09:01 AM, Avi Kivity wrote:

On 04/26/2010 04:43 PM, Anthony Liguori wrote:
The reason I lean toward the direct launch model is that it gives 
the user a lot of flexibility in terms of using things like 
namespaces, DAC, cgroups, capabilities, etc.  A lot of potential 
features are lost when you do indirect launch because you have to 
teach the daemon how to support each of these features.


But what's the alternative?  Teach the user how to do all these things?


You can expose layers of API.  The lowest layer makes no changes to 
the security context.  A higher (optional) layer could do dynamic 
labelling.


Or a library that the user-written launcher calls.  Or a plugin that 
qemud calls.


It's infinitely flexible, but it's not an API you can give to a 
management tool developer.


I think the goal of a management API should be to make common things 
very simple to do but not preclude doing even the most advanced things.


Agreed.

--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Libvirt debug API

2010-04-26 Thread Avi Kivity


On 04/26/2010 04:43 PM, Anthony Liguori wrote:
The reason I lean toward the direct launch model is that it gives the 
user a lot of flexibility in terms of using things like namespaces, 
DAC, cgroups, capabilities, etc.  A lot of potential features are lost 
when you do indirect launch because you have to teach the daemon how 
to support each of these features.


But what's the alternative?  Teach the user how to do all these things?

It's infinitely flexible, but it's not an API you can give to a 
management tool developer.


--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Libvirt debug API

2010-04-26 Thread Avi Kivity


On 04/26/2010 04:46 PM, Anthony Liguori wrote:
(3) The system management application can certainly create whatever 
context it wants to launch a vm from.  It's comes down to who's 
responsible for creating the context the guest runs under.  I think 
doing that at the libvirt level takes away a ton of flexibility from 
the management application.


If you want to push the flexibility slider all the way to the right 
you get bare qemu.  It exposes 100% of qemu capabilities.  And it's 
not so bad these days.  But it's not something that can be remoted.


As I mentioned earlier, remoting is not a very important use-case to me.

Does RHEV-M actually use the remote libvirt interface?  I assume it'll 
talk to vdsm via some protocol and vdsm will use the local libvirt API. 


Yes.


I suspect most uses of libvirt are actually local uses.


I expect the same, though I'm sure a design goal was to make use of 
libvirt be reasonable through the remote API.  If we aren't able to 
fulfil it, much of the value of libvirt goes away.


--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Libvirt debug API

2010-04-26 Thread Avi Kivity


On 04/26/2010 04:14 PM, Anthony Liguori wrote:


IOW, libvirt does not run guests as separate users which is why it 
needs to deal with security in the first place.


What if one user has multiple guests?  isolation is still needed.



Don't confuse a management application's concept of users with using 
separate uid's to launch guests.


Then someone needs to manage those users.  A user can't suid to any 
random user.  You need someone privileged to allocate the new uid and su 
into it.




One user per guest does not satisfy some security requirements.  The 
'M' in selinux stands for mandatory, which means that the entities 
secured can't leak information even if they want to (scenario: G1 
breaks into qemu, chmods files, G2 breaks into qemu, reads files).


If you're implementing a chinese firewall policy, then yes, you want 
to run each guest as a separate selinux context.  Starting as separate 
users and setting DAC privileges appropriately will achieve this.


But you're not always implementing that type of policy.  If the guest 
inherits the uid, selinux context, and namespaces of whatever launches 
the guest, then you have the most flexibility from a security 
perspective.


How do you launch a libvirt guest in a network namespace?  How do you 
put it in a chroot? 


You pass the namespace fd and chroot fd using SCM_RIGHTS (except you 
probably can't do that).


Today, you have to make changes to libvirt whereas in a direct launch 
model, you get all of the neat security features linux supports for free.


But you lose tap networking, unless you have a privileged helper.  And 
how is the privileged helper to authenticate the qemu calling it?



And I've said in the past that I don't like the idea of a qemud :-)


I must have missed it.  Why not?  Every other hypervisor has a 
central management entity.


Because you end up launching all guests from a single security context.


Run multiple qemuds?

But what you say makes sense.  It's similar to the fork()  /* do 
interesting stuff */ exec() model, compared to the spawn(..., hardcoded 
list of interesting stuff).


Yeah, that's where I'm at.  I'd eventually like libvirt to use our 
provided API and I can see where it would add value to the stack (by 
doing things like storage and network management).


We do provide an API, qmp, and libvirt uses it?


Yeah, but we need to support more features (like guest enumeration).



What are our options?

1) qemud launches, enumerates
2) user launches, qemu registers in qemud
3) user launches, qemu registers in filesystem
4) you launched it, you enumerate it

That's wrong for three reasons.  First, selinux is not a uid 
replacement (if it was libvirt could just suid $random_user before 
launching qemu).  Second, a single user's guests should be protected 
from each other.  Third, in many deployments, the guest's owner isn't 
logged in to supply the credentials, it's system management that 
launches the guests.


(1) uid's are just one part of an applications security context.  
There's an selinux context, all of the various namespaces, 
capabilities, etc.  If you use a daemon to launch a guest, you lose 
all of that unless you have a very sophisticated api.


True.  In a perfect world, we'd use SCM_RIGHTS to channel all of these 
to libvirt or qemud.


On the other hand, users don't want to do all these things by hand.  
They want management to do things for them.  Self launch is very 
flexible, but it's not an API, and cannot be used remotely.


We could use qemud plugins to allow the user to customize the launch 
process.




(2) If you want to implement a policy that only a single guest can 
access a single image, you can create an SELinux policy and use static 
labelling to achieve that.  That's just one type of policy though.


It's also not going to work in an environment that doesn't preserve all 
security labels (like direct access to volumes; /dev is on tmpfs these 
days).


(3) The system management application can certainly create whatever 
context it wants to launch a vm from.  It's comes down to who's 
responsible for creating the context the guest runs under.  I think 
doing that at the libvirt level takes away a ton of flexibility from 
the management application.


If you want to push the flexibility slider all the way to the right you 
get bare qemu.  It exposes 100% of qemu capabilities.  And it's not so 
bad these days.  But it's not something that can be remoted.



--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Libvirt debug API

2010-04-25 Thread Avi Kivity


On 04/26/2010 04:53 AM, Anthony Liguori wrote:

On 04/25/2010 06:51 AM, Avi Kivity wrote:
It depends on what things you think are important.  A lot of 
libvirt's complexity is based on the fact that it uses a daemon and 
needs to deal with the security implications of that.  You don't 
need explicit labelling if you don't use a daemon. 



I don't follow.  If you have multiple guests that you want off each 
other's turf you have to label their resources, either statically or 
dynamically.  How is it related to a daemon being present?


Because libvirt has to perform this labelling because it loses the 
original user's security context.


If you invoke qemu with the original user's credentials that launched 
the guest, then you don't need to do anything special with respect to 
security.


IOW, libvirt does not run guests as separate users which is why it 
needs to deal with security in the first place.


What if one user has multiple guests?  isolation is still needed.

One user per guest does not satisfy some security requirements.  The 'M' 
in selinux stands for mandatory, which means that the entities secured 
can't leak information even if they want to (scenario: G1 breaks into 
qemu, chmods files, G2 breaks into qemu, reads files).




This is really the qemu model (as opposed to the xend model). 


(and the qemud model).


And I've said in the past that I don't like the idea of a qemud :-)


I must have missed it.  Why not?  Every other hypervisor has a central 
management entity.




In theory, it does support this with the session urls but they are 
currently second-class citizens in libvirt.  The remote dispatch 
also adds a fair bit of complexity and at least for the use-cases 
I'm interested in, it's not an important feature.


If libvirt needs a local wrapper for interesting use cases, then it 
has failed.  You can't have a local wrapper with the esx driver, for 
example.


This is off-topic, but can you detail why you don't want remote 
dispatch (I assume we're talking about a multiple node deployment).


Because there are dozens of remote management APIs and then all have a 
concept of agents that run on the end nodes.  When fitting 
virtualization management into an existing management infrastructure, 
you are going to always use a local API.


When you manage esx, do you deploy an agent?  I thought it was all done 
via their remote APIs.




Every typical virtualization use will eventually grow some 
non-typical requirements.  If libvirt explicitly refuses to support 
qemu features, I don't see how we can recommend it - even if it 
satisfies a user's requirements today, what about tomorrow? what 
about future qemu feature, will they be exposed or not?


If that is the case then we should develop qemud (which libvirt and 
other apps can use).


(even if it isn't the case I think qemud is a good idea)


Yeah, that's where I'm at.  I'd eventually like libvirt to use our 
provided API and I can see where it would add value to the stack (by 
doing things like storage and network management).


We do provide an API, qmp, and libvirt uses it?



That's not what the libvirt community wants to do.  We're very 
bias.  We've made decisions about how features should be exposed and 
what features should be included.  We want all of those features 
exposed exactly how we've implemented them because we think it's the 
right way.


I'm not sure there's an obvious way forward unless we decide that 
there is going to be two ways to interact with qemu.  One way is 
through the libvirt world-view and the other is through a more qemu 
centric view.  The problem then becomes allowing those two models to 
co-exist happily together.


I don't think there's a point in managing qemu through libvirt and 
directly in parallel.  It means a user has to learn both APIs, and 
for every operation they need to check both to see what's the best 
way of exploiting the feature.  There will invariably be some friction.


Layers need to stack on top of each other, not live side by side or 
bypass each other.


I agree with you theoretically but practically, I think it's immensely 
useful as a stop-gap.


Sure.  But please lets not start being clever with transactions and 
atomic operations and stuff, it has to come with a label that says, if 
you're using this, then something is wrong.





The alternative is to get libvirt to just act as a thin layer to 
expose qemu features directly.  But honestly, what's the point of 
libvirt if they did that? 


For most hypervisors, that's exactly what libvirt does.  For Xen, it 
also bypasses Xend and the hypervisor's API, but it shouldn't really.


Historically, xend was so incredibly slow (especially for frequent 
statistics collection) that it was a necessity.


Ah, reimplement rathe

Re: [libvirt] [Qemu-devel] Re: Libvirt debug API

2010-04-25 Thread Avi Kivity


On 04/23/2010 09:33 PM, Anthony Liguori wrote:
This is a different ambiguity, about the semantic results of the 
commands,

where as I'm refering to the execution order. If I look at a libvirt log
file and see a set of JSON commands logged, I want to know that this 
ordering
from the logs, was indeed the same as order in which qemu processed 
them. If
you have two separate monitor connection you can't be sure of the 
order of
execution. It is key for our bug troubleshooting that given a libvirt 
log
file, we can replay the JSON commands again and get the same results. 
Two

monitor connections is just increasing complexity of code without any
tangible benefit.


I think you're assuming direct access to the second monitor?  I'm not 
suggesting that.  I'm suggesting that libvirt is still the one 
submitting commands to the second monitor and that it submits those 
commands in lock step.




What about protocol extensions?  For instance, pretend libvirt doesn't 
support async messages, what would it do when it receives one from the 
user's monitor?


--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Libvirt debug API

2010-04-25 Thread Avi Kivity


On 04/25/2010 06:39 AM, Anthony Liguori wrote:

On 04/24/2010 04:46 AM, Avi Kivity wrote:

On 04/23/2010 09:29 PM, Anthony Liguori wrote:
Maybe.  We'll still have issues.  For example, sVirt: if a QMP 
command names a labeled resource, the non-libvirt user will have no 
way of knowing how to label it.



This is orthogonal to QMP and has to do strictly with how libvirt 
prepares a resource for qemu.



It's not orthogonal.  If you allow qmp access behind libvirt's back, 
it's a problem that you will have.


My point was, if libvirt is just exposing raw qemu features, then it 
should be possible for qemu to arbitrate concurrent access.  If 
libvirt implements features on top of qemu, then no other third party 
will be able to co-exist with those features without interacting with 
qemu.  It's an impossible problem for qemu to solve (arbitrating 
access to state stored in a third party management app).


If libvirt implement features (like sVirt or network configuration) then 
it is indeed impossible for qemu to arbitrate.  If we take all those 
features into qemu[d], then it becomes possible to arbitrate so long as 
the libvirt and the other management app don't step on each others 
toes.  But that's impossible to guarantee if you upgrade your libvirt 
while keeping the other app unchanged.




1) Allow libvirt users to access features of qemu that are not 
exposed through libvirt


That's an artificial problem.  If libvirt exposes all features, you 
don't need to solve it.


It won't.  Otherwise, we wouldn't be having this discussion.


Then libvirt will fade into uselessness.  A successful app using libvirt 
will grow, and will have new requirements.  As soon as libvirt doesn't 
meet those new requirements, the app will need to talk to qemu[d] 
directly.  Once it does that, it may as well use qemu[d] for everything; 
if you can talk QMP and generate qemu command lines, there's not much 
that libvirt buys you.  Even the cross-hypervisor support is not that 
hard to implement, especially if you only need to satisfy your own 
requirements.







2) Provide a means for non-libvirt users to interact with qemu


We have qmp.  It doesn't do multiple guest management.  I think it's 
reasonable to have a qemud which does (and also does sVirt and the 
zillion other things libvirt does) provided we remove them from 
libvirt (long term).  The only problem is that it's a lot of effort.


It depends on what things you think are important.  A lot of libvirt's 
complexity is based on the fact that it uses a daemon and needs to 
deal with the security implications of that.  You don't need explicit 
labelling if you don't use a daemon. 


I don't follow.  If you have multiple guests that you want off each 
other's turf you have to label their resources, either statically or 
dynamically.  How is it related to a daemon being present?


This is really the qemu model (as opposed to the xend model). 


(and the qemud model).

In theory, it does support this with the session urls but they are 
currently second-class citizens in libvirt.  The remote dispatch also 
adds a fair bit of complexity and at least for the use-cases I'm 
interested in, it's not an important feature.


If libvirt needs a local wrapper for interesting use cases, then it has 
failed.  You can't have a local wrapper with the esx driver, for example.


This is off-topic, but can you detail why you don't want remote dispatch 
(I assume we're talking about a multiple node deployment).






3) Provide a unified and interoperable view of the world for 
non-libvirt and libvirt users


This problem can be solved by the non-libvirt users adopting libvirt, 
or the libvirt users dropping libvirt.  I don't understand why we 
need to add interoperability between users who choose an 
interoperability library and users who don't choose an 
interoperability library.


What I'd like to avoid is user confusion.  Should a user use libvirt 
or libqemu?  If they make a decision to use libqemu and then down the 
road want to use libvirt, how hard is it to switch?  Fragmentation 
hurts the ecosystem and discourages good applications from existing.  
I think it's our responsibility to ensure there's a good management 
API that exists for qemu that we can actively recommend to our users.  
libvirt is very good at typical virtualization uses of qemu but qemu 
is much more than just that and has lots of advanced features.


Every typical virtualization use will eventually grow some non-typical 
requirements.  If libvirt explicitly refuses to support qemu features, I 
don't see how we can recommend it - even if it satisfies a user's 
requirements today, what about tomorrow? what about future qemu feature, 
will they be exposed or not?


If that is the case then we should develop qemud (which libvirt and 
other apps ca

Re: [libvirt] [Qemu-devel] Re: Libvirt debug API

2010-04-24 Thread Avi Kivity


On 04/23/2010 09:29 PM, Anthony Liguori wrote:
Maybe.  We'll still have issues.  For example, sVirt: if a QMP 
command names a labeled resource, the non-libvirt user will have no 
way of knowing how to label it.



This is orthogonal to QMP and has to do strictly with how libvirt 
prepares a resource for qemu.



It's not orthogonal.  If you allow qmp access behind libvirt's back, 
it's a problem that you will have.




Much better to exact a commitment from libvirt to track all QMP (and 
command line) capabilities.  Instead of adding cleverness to QMP, add 
APIs to libvirt.




Let's step back for a minute because I think we're missing the forest 
through the trees.


We're trying to address a few distinct problems:

1) Allow libvirt users to access features of qemu that are not exposed 
through libvirt


That's an artificial problem.  If libvirt exposes all features, you 
don't need to solve it.




2) Provide a means for non-libvirt users to interact with qemu


We have qmp.  It doesn't do multiple guest management.  I think it's 
reasonable to have a qemud which does (and also does sVirt and the 
zillion other things libvirt does) provided we remove them from libvirt 
(long term).  The only problem is that it's a lot of effort.




3) Provide a unified and interoperable view of the world for 
non-libvirt and libvirt users


This problem can be solved by the non-libvirt users adopting libvirt, or 
the libvirt users dropping libvirt.  I don't understand why we need to 
add interoperability between users who choose an interoperability 
library and users who don't choose an interoperability library.




For (1), we all agree that the best case scenario would be for libvirt 
to support every qemu feature.  I think we can also all agree though 
that this is not really practical and certainly not practical for 
developers since there is a development cost associated with libvirt 
support (to model an API appropriately).


All except me, perhaps.

We already have two layers of feature modeling: first, we mostly emulate 
real life, not invent new features.  PCI hotplug existed long before 
qemu had support for it.  Second, we do give some thought into how we 
expose it through QMP.  libvirt doesn't have to invent it again, it only 
has to expose it through its lovely xml and C APIs.




The new API proposed addresses (1) by allowing a user to drill down to 
the QMP context.  It's a good solution IMHO and I think we all agree 
that there's an inherent risk to this that users will have to evaluate 
on a case-by-case basis.  It's a good stop-gap though.


Agree.



(2) is largely addressed by QMP and a config file.  I'd like to see a 
nice C library, but I think a lot of other folks are happy with JSON 
support in higher level languages.


I agree with them.  C is a pretty bad choice for managing qemu (or even, 
C is a pretty bad choice).




(3) is the place where there are still potential challenges.  I think 
at the very least, our goal should be to enable conversion from (2) 
and (1) to be as easy as possible.  That's why I have proposed 
implementing a C library for the JSON transport because we could plumb 
that through the new libvirt API.  This would allow a user to very 
quickly port an application from QMP to libvirt.  In order to do this, 
we need the libvirt API to expose a dedicated monitor because we'll 
need to be able to manipulate events and negotiate features.


Most likely any application that talks QMP will hide the protocol behind 
a function call interface anyway.


Beyond simple porting, there's a secondary question of having 
non-libvirt apps co-exist with libvirt apps.  I think it's a good long 
term goal, but I don't think we should worry too much about it now.


libvirt needs to either support all but the most esoteric use cases, or 
to get out of the way completely.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Libvirt debug API

2010-04-23 Thread Avi Kivity


On 04/23/2010 04:48 PM, Anthony Liguori wrote:

On 04/23/2010 07:48 AM, Avi Kivity wrote:

On 04/22/2010 09:49 PM, Anthony Liguori wrote:
real API. Say, adding a device libvirt doesn't know about or 
stopping the VM

while libvirt thinks it's still running or anything like that.
  Another problem is issuing Monitor commands that could confuse 
libvirt's


We need to make libvirt and qemu smarter.

We already face this problem today with multiple libvirt users.  
This is why sophisticated management mechanisms (like LDAP) have 
mechanisms to do transactions or at least a series of atomic 
operations.


And people said qmp/json was overengineered...

But seriously, transactions won't help anything.  qemu maintains 
state, and when you have two updaters touching a shared variable not 
excepting each other to, things break, no matter how much locking 
there is.


Let's consider some concrete examples.  I'm using libvirt and QMP and 
in QMP, I want to hot unplug a device.


Today, I do this by listing the pci devices, and issuing a pci_del 
that takes a PCI address.  This is intrinsically racy though because 
in the worst case scenario, in between when I enumerate pci devices 
and do the pci_del in QMP, in libvirt, I've done a pci_del and then a 
pci_add within libvirt of a completely different device.


Obviously you should do the pci_del through libvirt.  Once libvirt 
supports an API, use it.




There are a few ways to solve this, the simplest being that we give 
devices unique ids that are never reused and instead of pci_del taking 
a pci bus address, it takes a device id.  That would address this race.


You can get very far by just being clever about unique ids and 
notifications.  There are some cases where a true RMW may be required 
but I can't really think of one off hand.  The way LDAP addresses this 
is that it has a batched operation and a simple set of boolean 
comparison operations.  This lets you execute a batched operation that 
will do a RMW.


I'm sure we can be very clever, but I'd rather direct this cleverness to 
qemu core issues, not to the QMP (which in turn requires that users be 
clever to use it correctly).  QMP is a low bandwidth protocol, so races 
will never show up in testing.  We're laying mines here for users to 
step on that we will never encounter ourselves.




  The only way that separate monitors could work is if they touch 
completely separate state, which is difficult to ensure if you 
upgrade your libvirt.




I don't think this is as difficult of a problem as you think it is.  
If you look at Active Directory and the whole set of management tools 
based on it, they certainly allow concurrent management applications.  
You can certainly get into trouble still but with just some careful 
considerations, you can make two management applications work together 
90% of the time without much fuss on the applications part.


Maybe.  We'll still have issues.  For example, sVirt: if a QMP command 
names a labeled resource, the non-libvirt user will have no way of 
knowing how to label it.


Much better to exact a commitment from libvirt to track all QMP (and 
command line) capabilities.  Instead of adding cleverness to QMP, add 
APIs to libvirt.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Libvirt debug API

2010-04-23 Thread Avi Kivity


On 04/22/2010 09:49 PM, Anthony Liguori wrote:
real API. Say, adding a device libvirt doesn't know about or stopping 
the VM

while libvirt thinks it's still running or anything like that.
  Another problem is issuing Monitor commands that could confuse 
libvirt's


We need to make libvirt and qemu smarter.

We already face this problem today with multiple libvirt users.  This 
is why sophisticated management mechanisms (like LDAP) have mechanisms 
to do transactions or at least a series of atomic operations.


And people said qmp/json was overengineered...

But seriously, transactions won't help anything.  qemu maintains state, 
and when you have two updaters touching a shared variable not excepting 
each other to, things break, no matter how much locking there is.  The 
only way that separate monitors could work is if they touch completely 
separate state, which is difficult to ensure if you upgrade your libvirt.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-26 Thread Avi Kivity


On 03/25/2010 10:18 AM, Alexander Graf wrote:



libqemu.so would be a C API.  C is not the first choice for writing GUIs or 
management applications.  So it would need to be further wrapped.

We also need to allow qemu to control the display directly, without going 
through vnc.
 

For the current functionality I tend to disagree. All that we need is an shm 
vnc extension that allows the GUI and qemu to not send image data over the 
wire, but only the dirtyness information.
   


It still means an extra copy.  I don't think we want to share the guest 
framebuffer (it includes offscreen bitmaps), so we'll need to copy it 
somewhere else.  It's even worse with qxl/spice where there is no 
framebuffer.



As soon as we get to 3D things might start to look different.
   


Very different.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-26 Thread Avi Kivity


On 03/26/2010 10:37 AM, Markus Armbruster wrote:



The importances of libqemu is:

1) Providing a common QMP transport implementation that is extensible
by third parties
2) Providing a set of common transports that support automatic
discovery of command line launched guests
3) Providing a generic QMP dispatch function
 

Adding to this C wrappers for QMP commands threatens to make QMP command
arguments part of the library ABI.  Compatible QMP evolution (like
adding an optional argument) turns into a libqmp soname bump.
Counter-productive.  How do you plan to avoid that?
   


You could make the API use QObjects; then you're completely isolated 
from high level protocol changes.  Of course, this is less useful than 
the full API.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-25 Thread Avi Kivity


On 03/25/2010 03:57 PM, Anthony Liguori wrote:

On 03/25/2010 08:48 AM, Avi Kivity wrote:


But an awful lot of the providers for pegasus are written in C.


But we're concerned with only one, the virt provider.  None of the 
others will use libqemu?


The point is, C is a lowest common denominator and it's important to 
support in a proper way.


Problem is, it means horrible support for everyone else.


Why?

We can provide a generic QMP dispatch interface that high level 
languages can use.  Then they can do fancy dispatch, treat QErrors as 
exceptions, etc.




Sure, with high level wrappers everything's fine.

--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-25 Thread Avi Kivity


On 03/25/2010 03:44 PM, Anthony Liguori wrote:

On 03/25/2010 07:37 AM, Avi Kivity wrote:

On 03/25/2010 02:33 PM, Anthony Liguori wrote:
From my point of view, i wouldn't want to write a high level 
management toolstack in C, specially
since the API is well defined JSON which is easily available in all 
high level language out there.



There's a whole world of C based management toolstacks (CIM).



Gratefully I know very little about CIM, but isn't it language 
independent?


The prominent open source implementation, pegasus, is written in C++.


There is also SFCB which is written in C.


Ok.



But an awful lot of the providers for pegasus are written in C.


But we're concerned with only one, the virt provider.  None of the 
others will use libqemu?


The point is, C is a lowest common denominator and it's important to 
support in a proper way.


Problem is, it means horrible support for everyone else.

--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-25 Thread Avi Kivity


On 03/25/2010 02:33 PM, Anthony Liguori wrote:
From my point of view, i wouldn't want to write a high level 
management toolstack in C, specially
since the API is well defined JSON which is easily available in all 
high level language out there.



There's a whole world of C based management toolstacks (CIM).



Gratefully I know very little about CIM, but isn't it language independent?

The prominent open source implementation, pegasus, is written in C++.

Or are you referring to specific management apps written in C?  If they 
go through CIM, how can they talk qmp?


--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-25 Thread Avi Kivity


On 03/25/2010 10:26 AM, Vincent Hanquez wrote:

On 24/03/10 21:40, Anthony Liguori wrote:

If so, what C clients you expected beyond libvirt?


Users want a C API.  I don't agree that libvirt is the only C 
interface consumer out there.


(I've seen this written too many times ...)
How do you know that ? did you do a poll or something where *actual* 
users vote/tell ?


From my point of view, i wouldn't want to write a high level 
management toolstack in C, specially
since the API is well defined JSON which is easily available in all 
high level language out there.




Strongly agreed.  Even the managementy bits of qemu (anything around 
QObject) are suffering from the lowleveledness of C.


--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Avi Kivity


On 03/24/2010 10:32 PM, Anthony Liguori wrote:


So far, a libqemu.so with a flexible transport that could be used 
directly by a libvirt user (ala cairo/gdk type interactions) seems 
like the best solution to me.



libqemu.so would be a C API.  C is not the first choice for writing GUIs 
or management applications.  So it would need to be further wrapped.


We also need to allow qemu to control the display directly, without 
going through vnc.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Avi Kivity


On 03/24/2010 06:42 PM, Luiz Capitulino wrote:

On Wed, 24 Mar 2010 12:42:16 +0200
Avi Kivity  wrote:

   

So, at best qemud is a toy for people who are annoyed by libvirt.
 

  Is the reason for doing this in qemu because libvirt is annoying?


Mostly.


I don't see
how adding yet another layer/daemon is going to improve ours and user's life
(the same applies for libqemu).
   


libvirt becomes optional.


  If I got it right, there were two complaints from the kvm-devel flamewar:

1. Qemu has usability problems
2. There's no way an external tool can get /proc/kallsyms info from Qemu

  I don't see how libqemu can help with 1) and having qemud doesn't seem
the best solution for 2) either.

  Still talking about 2), what's wrong in getting the PID or having a QMP
connection in a well known location as suggested by Anthony?
   


I now believe that's the best option.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Avi Kivity


On 03/24/2010 02:32 PM, Anthony Liguori wrote:
You don't get a directory filled with a zillion socket files pointing 
at dead guests.  Agree that's a poor return on investment.



Deleting it on atexit combined with flushing the whole directory at 
startup is a pretty reasonable solution to this (which is ultimately 
how the entirety of /var/run behaves).


If you're really paranoid, you can fork() a helper with a shared pipe 
to implement unlink on close.


My paranoia comes nowhere near to my dislike of forked helpers.

--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Avi Kivity


On 03/24/2010 02:30 PM, Paul Brook wrote:

On 03/23/2010 09:24 PM, Anthony Liguori wrote:
 

We also provide an API for guest creation (the qemu command line).
   

As an aside, I'd like to see all command line options have qmp
equivalents (most of them can be implemented with a 'set' command that
writes qdev values).  This allows a uniform way to control a guest,
whether at startup or runtime.  You start with a case, cold-plug a
motherboard, cpus, memory, disk controllers, and power it on.
 

The main blocker to this is converting all the devices to qdev. "partial"
conversions are not sufficient. It's approximately the same problem as a
machine config file. If you have one then the other should be fairly trivial.
   


Agreed.


IMO the no_user flag is a bug, and should not exist.
   


Sorry, what's that?

--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Avi Kivity


On 03/24/2010 02:30 PM, Anthony Liguori wrote:

On 03/24/2010 07:27 AM, Avi Kivity wrote:

On 03/24/2010 02:19 PM, Anthony Liguori wrote:

qemud
  - daemonaizes itself
  - listens on /var/lib/qemud/guests for incoming guest connections
  - listens on /var/lib/qemud/clients for incoming client connections
  - filters access according to uid (SCM_CREDENTIALS)
  - can pass a new monitor to client (SCM_RIGHTS)
  - supports 'list' command to query running guests
  - async messages on guest startup/exit



Then guests run with the wrong security context.


Why?  They run with the security context of whoever launched them 
(could be libvirtd).


Because it doesn't have the same security context as qemud and since 
clients have to connect to qemud, qemud has to implement access control.


Yeah.

It's far better to have the qemu instance advertise itself such that 
and client connects directly to it.  Then all of the various 
authorization models will be applied correctly to it.


Agreed.  qemud->exit().

--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Avi Kivity


On 03/24/2010 02:23 PM, Anthony Liguori wrote:

On 03/24/2010 05:42 AM, Avi Kivity wrote:



The filtering access part of this daemon is also not mapping well onto
libvirt's access model, because we don't soley filter based on UID in
libvirtd. We have it configurable based on UID, policykit, SASL, 
TLS/x509

already, and intend adding role based access control to further filter
things, integrating with the existing apparmour/selinux security 
models.
A qemud that filters based on UID only, gives users a side-channel 
to get

around libvirt's access control.


That's true.  Any time you write a multiplexer these issues crop up.  
Much better to stay in single process land where everything is 
already taken care of.


What does a multiplexer give you that making individual qemu instances 
discoverable doesn't give you?  The later doesn't suffer from these 
problems.




You don't get a directory filled with a zillion socket files pointing at 
dead guests.  Agree that's a poor return on investment.


Maybe we want a O_UNLINK_ON_CLOSE for unix domain sockets - but no, 
that's not implementable.


--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Avi Kivity


On 03/24/2010 02:19 PM, Anthony Liguori wrote:

qemud
  - daemonaizes itself
  - listens on /var/lib/qemud/guests for incoming guest connections
  - listens on /var/lib/qemud/clients for incoming client connections
  - filters access according to uid (SCM_CREDENTIALS)
  - can pass a new monitor to client (SCM_RIGHTS)
  - supports 'list' command to query running guests
  - async messages on guest startup/exit



Then guests run with the wrong security context.


Why?  They run with the security context of whoever launched them (could 
be libvirtd).


--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-24 Thread Avi Kivity


On 03/24/2010 12:36 PM, Daniel P. Berrange wrote:

On Wed, Mar 24, 2010 at 07:17:26AM +0200, Avi Kivity wrote:
   

On 03/23/2010 08:00 PM, Avi Kivity wrote:
 

On 03/23/2010 06:06 PM, Anthony Liguori wrote:
   

I thought the monitor protocol *was* our API. If not, why not?
   

It is.  But our API is missing key components like guest
enumeration.  So the fundamental topic here is, do we introduce these
missing components to allow people to build directly to our interface
or do we make use of the functionality that libvirt already provides
if they can plumb our API directly to users.

 

Guest enumeration is another API.

Over the kvm call I suggested a qemu concentrator that would keep
track of all running qemus, and would hand out monitor connections to
users.  It can do the enumeration (likely using qmp).  Libvirt could
talk to that, like it does with other hypervisors.

   

To elaborate

qemud
   - daemonaizes itself
   - listens on /var/lib/qemud/guests for incoming guest connections
   - listens on /var/lib/qemud/clients for incoming client connections
   - filters access according to uid (SCM_CREDENTIALS)
   - can pass a new monitor to client (SCM_RIGHTS)
   - supports 'list' command to query running guests
   - async messages on guest startup/exit
 

My concern is that once you provide this, then next someone wants it to
list inactive guests too.


That's impossible, since qemud doesn't manage config files or disk 
images.  It can't even launch guests!



Once you list inactive guests, then you'll
want this to start a guest. Once you start guests then you want cgroups
integration, selinux labelling&  so on, until it ends up replicating all
of libvirt's QEMU functionality.

To be able to use the list functionality from libvirt, we need this daemon
to also guarentee id, name&  uuid uniqueness for all VMs, both running and
inactive, with separate namespaces for the system vs per-user lists. Or
we have to ignore any instances listed by qemud that were not started  by
libvirt, which rather defeats the purpose.
   


qemud won't guarantee name uniqueness or provide uuids.


The filtering access part of this daemon is also not mapping well onto
libvirt's access model, because we don't soley filter based on UID in
libvirtd. We have it configurable based on UID, policykit, SASL, TLS/x509
already, and intend adding role based access control to further filter
things, integrating with the existing apparmour/selinux security models.
A qemud that filters based on UID only, gives users a side-channel to get
around libvirt's access control.
   


That's true.  Any time you write a multiplexer these issues crop up.  
Much better to stay in single process land where everything is already 
taken care of.


So, at best qemud is a toy for people who are annoyed by libvirt.

--
error compiling committee.c: too many arguments to function

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-23 Thread Avi Kivity


On 03/23/2010 09:24 PM, Anthony Liguori wrote:


We also provide an API for guest creation (the qemu command line).



As an aside, I'd like to see all command line options have qmp 
equivalents (most of them can be implemented with a 'set' command that 
writes qdev values).  This allows a uniform way to control a guest, 
whether at startup or runtime.  You start with a case, cold-plug a 
motherboard, cpus, memory, disk controllers, and power it on.


I would also like a way to read the entire qdev tree from qmp.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-23 Thread Avi Kivity


On 03/23/2010 08:00 PM, Avi Kivity wrote:

On 03/23/2010 06:06 PM, Anthony Liguori wrote:

I thought the monitor protocol *was* our API. If not, why not?


It is.  But our API is missing key components like guest 
enumeration.  So the fundamental topic here is, do we introduce these 
missing components to allow people to build directly to our interface 
or do we make use of the functionality that libvirt already provides 
if they can plumb our API directly to users.




Guest enumeration is another API.

Over the kvm call I suggested a qemu concentrator that would keep 
track of all running qemus, and would hand out monitor connections to 
users.  It can do the enumeration (likely using qmp).  Libvirt could 
talk to that, like it does with other hypervisors.




To elaborate

qemud
  - daemonaizes itself
  - listens on /var/lib/qemud/guests for incoming guest connections
  - listens on /var/lib/qemud/clients for incoming client connections
  - filters access according to uid (SCM_CREDENTIALS)
  - can pass a new monitor to client (SCM_RIGHTS)
  - supports 'list' command to query running guests
  - async messages on guest startup/exit

qemu
  - with -qemud option, connects to qemud (or maybe automatically?)

qemudc
  - command-line client, can access qemu human monitor

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Supporting hypervisor specific APIs in libvirt

2010-03-23 Thread Avi Kivity


On 03/23/2010 09:31 PM, Anthony Liguori wrote:




One problem is that this is libvirt version specific.  For example, 
libvirt x doesn't support spice so we control that thorough qmp.  But 
libvirt x+1 does support spice and now it gets confused about all the 
spice messages.


That's only a problem if we only support a single QMP session.  This 
is exactly why we need to support multiple QMP sessions (and do).


It's unrelated to the number of sessions.  libvirt expects state that it 
manages in qemu not to change randomly.  Users know that, so they will 
only manage non-libvirt state in their private session.  But a new 
version of libvirt may expand its scope and start managing this area, 
leading to conflicts.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-23 Thread Avi Kivity


On 03/23/2010 08:23 PM, Daniel P. Berrange wrote:

On Tue, Mar 23, 2010 at 08:00:21PM +0200, Avi Kivity wrote:
   

On 03/23/2010 06:06 PM, Anthony Liguori wrote:
 

I thought the monitor protocol *was* our API. If not, why not?
 

It is.  But our API is missing key components like guest enumeration.
So the fundamental topic here is, do we introduce these missing
components to allow people to build directly to our interface or do we
make use of the functionality that libvirt already provides if they
can plumb our API directly to users.

   

Guest enumeration is another API.

Over the kvm call I suggested a qemu concentrator that would keep track
of all running qemus, and would hand out monitor connections to users.
It can do the enumeration (likely using qmp).  Libvirt could talk to
that, like it does with other hypervisors.
 

The libvirt QEMU driver started out as a fairly simple "concentrator" not
doing much beyond spawning QEMU with argv&  issuing monitor commands. The
host concentrator inevitably needs to be involved in the OS level integration
with features such as cgroups, selinux/apparmounr, host NIC management,
storage, iptables, etc. If you look at the daemons for Xen, VirtualBox,
VMWare, that other libvirt drivers talk to, they all do far more than
just enumeration of VMs. A QEMU concentrator may start out simple, but it will
end up growing over time to re-implememt much, if not all, the stuff that
libvirt already provides for QEMU in terms of host level APIs.


The idea is not to replace libvirt, but provide something that sits 
underneath.  It wouldn't do any non-qemu host-level APIs.



  If the core
problem here is to provide app developers access to the full range of QEMU
functionality, the re-implementing the entire of the libvirt QEMU driver is
rather over the top way to achieve that.
   


It's trivial to expose all qemu functionality by exposing a qmp connection.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Re: Supporting hypervisor specific APIs in libvirt

2010-03-23 Thread Avi Kivity


On 03/23/2010 06:06 PM, Anthony Liguori wrote:

I thought the monitor protocol *was* our API. If not, why not?


It is.  But our API is missing key components like guest enumeration.  
So the fundamental topic here is, do we introduce these missing 
components to allow people to build directly to our interface or do we 
make use of the functionality that libvirt already provides if they 
can plumb our API directly to users.




Guest enumeration is another API.

Over the kvm call I suggested a qemu concentrator that would keep track 
of all running qemus, and would hand out monitor connections to users.  
It can do the enumeration (likely using qmp).  Libvirt could talk to 
that, like it does with other hypervisors.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] Supporting hypervisor specific APIs in libvirt

2010-03-23 Thread Avi Kivity


On 03/22/2010 09:25 PM, Anthony Liguori wrote:

Hi,

I've mentioned this to a few folks already but I wanted to start a 
proper thread.


We're struggling in qemu with usability and one area that concerns me 
is the disparity in features that are supported by qemu vs what's 
implemented in libvirt.


This isn't necessarily libvirt's problem if it's mission is to provide 
a common hypervisor API that covers the most commonly used features.


However, for qemu, we need an API that covers all of our features that 
people can develop against.  The ultimate question we need to figure 
out is, should we encourage our users to always use libvirt or should 
we build our own API for people (and libvirt) to consume.


I don't think it's necessarily a big technical challenge for libvirt 
to support qemu more completely.  I think it amounts to introducing a 
series of virQemu APIs that implement qemu specific functions.  
Over time, qemu specific APIs can be deprecated in favour of more 
generic virDomain APIs.


What's the feeling about this from the libvirt side of things?  Is 
there interest in support hypervisor specific interfaces should we be 
looking to provide our own management interface for libvirt to consume?




One option is to expose a qmp connection to the client.  Of course that 
introduces a consistency problem (libvirt plugs in a card, user plugs it 
own, libvirt is confused).  If the user promises to behave, it can work 
for stuff that's 100% orthogonal to libvirt.


One problem is that this is libvirt version specific.  For example, 
libvirt x doesn't support spice so we control that thorough qmp.  But 
libvirt x+1 does support spice and now it gets confused about all the 
spice messages.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)

2009-05-12 Thread Avi Kivity


Anthony Liguori wrote:

Hollis Blanchard wrote:

On Wed, 2009-04-08 at 13:34 -0500, Anthony Liguori wrote:
 
Right now only one monitor device can be enabled at a time.  In 
order to support
asynchronous notification of events, I would like to introduce a 
'wait' command
that waits for an event to occur.  This implies that we need an 
additional
monitor session to allow commands to still be executed while waiting 
for an

asynchronous notification.



Was there any consensus reached in this thread? I'm once again looking
for ways to communicate qemu watchdog events to libvirt.
  


We can do multiple monitors as a debugging tool, but to support 
events, a proper machine monitor mode is a prerequisite.


The real requirement is that events are obtainable via a single 
communication channel instead of requiring two separate communication 
channels.  Internal implementation will look at lot like these patches.


The reasoning for requiring a single channel is that coordinating 
between the two channels is expected to be prohibitively difficult.  
To have a single channel, we need a machine mode.  It cannot be done 
in a human readable fashion.


I think this summarizes the consensus we reached.  I don't agree fully 
with the above but I'm okay with it.


If you don't agree with it, it isn't a consensus.


Would you agree Avi?


It represents my views fairly accurately.  I'm not convinced that you 
can't to event notifications without machine mode, but on the other hand 
I do think introducing machine mode and layering notifications on top of 
that is the best way to proceed, so I can't complain.


--
error compiling committee.c: too many arguments to function

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)

2009-04-16 Thread Avi Kivity


Jamie Lokier wrote:

Avi Kivity wrote:
  

Daniel P. Berrange wrote:

Yes indeed its a little crazy :-)  As anthony mentioned if libvirt were 
able to be notified of changes a user makes in the monitor, there's no 
reason we could not allow end users to access the monitor of a VM 
libvirt is managing. We just need to make sure libvirt doesn't miss

changes like attaching or detaching block devices, etc, because that'll
cause crash/data loss later when libvirt migrates or does save/restore,
etc because it'll launch QEMU with wrong args
 
  

You still have an inherent race here.

user: plug in disk
libvirt: start migration, still without disk
qemu: libvirt, a disk has been plugged in.



Then fix it.  The race is not necessary.

user: plug in a disk
libvirt: lock VM against user changes incompatible with migration
qemu: libvirt, lock granted
libvirt: query for current disk state
libvirt: start migration, knows about the disk

The "libvirt, a disk has been plugged in" will be delivered but it's
not important.  libvirt queries the state of things after it acquires
the lock and before it starts migration.

  


Migration is supposed to be transparent.  You're reducing quality of 
service if you're disabling features while migrating.


That means that to debug a problem in the field you have to locate a 
guest's host, and follow it around as it migrates (or disable migration).



That's right you do.  Is there any way to debug a guest without
disabling migration?  I don't think there is at present, so of course
you have to disable migration when you debug.  Another reason for that
"lock against migration" mentioned above.
  


Nothing prevents you from debugging a guest during migration.  You'll 
have to reconnect to the monitor, but that's it.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)

2009-04-14 Thread Avi Kivity


Daniel P. Berrange wrote:

On Tue, Apr 14, 2009 at 12:15:23PM +0300, Avi Kivity wrote:
  

Daniel P. Berrange wrote:

Yes indeed its a little crazy :-)  As anthony mentioned if libvirt were 
able to be notified of changes a user makes in the monitor, there's no 
reason we could not allow end users to access the monitor of a VM 
libvirt is managing. We just need to make sure libvirt doesn't miss

changes like attaching or detaching block devices, etc, because that'll
cause crash/data loss later when libvirt migrates or does save/restore,
etc because it'll launch QEMU with wrong args
 
  

You still have an inherent race here.

user: plug in disk
libvirt: start migration, still without disk
qemu: libvirt, a disk has been plugged in.



That is true, but we'd still be considering direct monitor access to
be a 'expert' user mode of use. If they wish to shoot themselves in
the foot by triggering a migration at same time they are hotplugging
I'm fine if their whole leg gets blown away.  
  


What if the system triggers migration automatically (as you'd expect).

And that's just one example. I'm sure there are more. libvirt issues 
commands expecting some state in qemu. It can't learn of that state from 
listening on another monitor, because there are delays between the state 
changing and the notification.


If you want things to work reliably, you have to follow the chain of 
command.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)

2009-04-14 Thread Avi Kivity


Jan Kiszka wrote:

That is true, but we'd still be considering direct monitor access to
be a 'expert' user mode of use. If they wish to shoot themselves in
the foot by triggering a migration at same time they are hotplugging
I'm fine if their whole leg gets blown away.  



...while there is also nothing that speaks against blocking any device
hot-plugging while migration is ongoing. Independent of if there is some
management app involved or the user himself plays with multiple monitors.

  


If the management is doing the hotplugging, it should just to it on both 
sides.



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)

2009-04-14 Thread Avi Kivity


Daniel P. Berrange wrote:
Yes indeed its a little crazy :-)  As anthony mentioned if libvirt were 
able to be notified of changes a user makes in the monitor, there's no 
reason we could not allow end users to access the monitor of a VM 
libvirt is managing. We just need to make sure libvirt doesn't miss

changes like attaching or detaching block devices, etc, because that'll
cause crash/data loss later when libvirt migrates or does save/restore,
etc because it'll launch QEMU with wrong args
  


You still have an inherent race here.

user: plug in disk
libvirt: start migration, still without disk
qemu: libvirt, a disk has been plugged in.


I don't see how adding those low-level monitory things to libvirt is
an improvement - debugging and scripted keystrokes are not the sort of
functionality libvirt is for - or is it?



I think it could probably be argued that sending fake keystrokes could
be within scope. Random ad-hoc debugging probably out of scope.
  


That means that to debug a problem in the field you have to locate a 
guest's host, and follow it around as it migrates (or disable migration).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)

2009-04-11 Thread Avi Kivity


Anthony Liguori wrote:

Avi Kivity wrote:

Anthony Liguori wrote:

IMHO, multiple monitors is a critical feature to support in the long 
term.


Multiple monitors are nice to have (for developers), but I don't see 
them as critical.


If you live in a world where there is a single management application 
that provides the only interface to interact with a QEMU instance, 
then yes, they aren't critical.




I do (or at least I hope I do).  Exposing the monitor to users is a 
layering violation.


The problem with this is that most management applications are lossy 
by their nature.  They expose only a subset of functionality supported 
by QEMU.


What if they don't expose a feature because they don't want to make the 
feature available to the user?


What happens when the user changes something that the management 
application thinks it controls?  Do we add notifiers on everything?


The qemu monitor is a different privilege level from being a virtual 
machine owner.  Sure, we could theoritically plug all the holes with, 
for example the user filling up the disk with screendumps.  But do we 
want to reduce security this way?


You're taking away control from the management application, due to what 
are the management application's misfeatures.  You should instead tell 
the vendor of your management application to add the missing feature.


Oh, and don't expect users of a management application to connect to the 
qemu monitor to administer their virtual machines.  They expect the 
management application to do that for them.  The qemu monitor is an 
excellent way to control a single VM, but not for controlling many.




Currently, the monitor is the "management interface" for QEMU.  If we 
only every support one instance of that management interface, then it 
means if multiple management applications are to interact with a given 
QEMU instance, they must all use a single API to do that then allows 
for multiplexing.  I see no reason that QEMU shouldn't do the 
multiplexing itself though.


Again, I don't oppose multiplexing (though I do oppose the wait command 
which requires it, and I oppose this "management apps suck, let's telnet 
to qemu directly" use you propose.




To put it another way, a user that uses libvirt today cannot see QEMU 
instances that are run manually.  That is not true when a user uses 
libvirt with Xen today because Xend provides a management interface 
that is capable of supporting multiple clients.  I think it's 
important to get the same level of functionality for QEMU.


N.B. yes, Xend is a horrendous example especially when your argument 
has been simplicity vs. complexity.


I'm sure libvirt really enjoys it when users use xm commands to change 
the VM state.  What happens when you migrate it, for example?  Or add a 
few dozen vcpus?




At the end of the day, I want to be able to run a QEMU instance from 
the command line, and have virt-manager be able to see it remotely and 
connect to it.  That means multiple monitors and it means that all 
commands that change VM state must generate some sort of notification 
such that libvirt can keep track of the changing state of a VM. 


I don't think most management application authors would expose the qemu 
monitor to users.  It sounds like a huge risk, and for what benefit?  If 
there's something interesting you can do with the monitor, add it to the 
management interface so people can actually use it.  They don't buy this 
stuff so they can telnet into the monitor.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)

2009-04-11 Thread Avi Kivity


Anthony Liguori wrote:


What's the established practice?  Do you know of any protocol that is 
line based that does notifications like this?




Actually there is one line oriented protocol that does asynchronous 
notifications.


http://faqs.org/rfcs/rfc1459.html


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)

2009-04-11 Thread Avi Kivity


Anthony Liguori wrote:

Avi Kivity wrote:

(qemu) notify vnc on

... time passes, we want to allow members of group x to log in

(qemu) vnc_set_acl group:x
OK
(qemu)
notification: vnc connect aliguori
(qemu)

with a single monitor, we can be sure that the connect happened the 
vnc_set_acl.  If the notification arrives on a different session, we 
have no way of knowing that.


Only because there isn't a time stamp associated with the completion 
of the other monitor command.  And you can globally replace timestamp 
with some sort of incrementing id that's associated with each 
notification and command completion.


Sure, you can fix the problem, but why introduce it in the first place?

I understand the urge for a simple command/response, but introducing 
multiple sessions breaks the "simple" and introduces new problems.




You'll need this to support multiple monitors even with your model.  


Can you explain why?  As far as I can tell, if you have async 
notifications, you can do everything from one monitor.


IMHO, multiple monitors is a critical feature to support in the long 
term.


Multiple monitors are nice to have (for developers), but I don't see 
them as critical.


I expect that in the short term future, we'll have a non-human 
monitor mode that allows commands to be asynchronous.


Then let's defer this until then?  'wait' is not useful for humans, 
they won't be retyping 'wait' every time something happens.


But wait is useful for management apps today.  A wait-forever, which 
is already in the next series, is also useful for humans.  It may not 
be a perfect interface, but it's a step in the right direction.  We 
have time before the next release and I expect that we'll have a 
non-human mode before then.


I disagree, I think requiring multiple sessions for controlling a single 
application is clumsy.  I can't think of one protocol which uses it.  I 
don't think IMAP requires multiple sessions (and I don't think commands 
from one session can affect the other, except through the mail store).




What's the established practice?  Do you know of any protocol that 
is line based that does notifications like this?


I guess most MUDs?


I've never used a MUD before, I think that qualifies as before my time 
:-)


Well I haven't either.  Maybe time to start.



IMAP IDLE is pretty close to "wait-forever".


IMAP IDLE can be terminated by the client, and so does not require 
multiple sessions (though IMAP supports them).


Most modern clients use multiple sessions.  If you look at IMAP, it 
doesn't multiplex commands so multiple sessions are necessary to 
maintain usefulness while performing a long running task.


But commands in one session don't affect others.



Anyway, I think terminating a wait is a perfectly reasonable requirement.


It breaks you command/response, though.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)

2009-04-09 Thread Avi Kivity


Anthony Liguori wrote:

Avi Kivity wrote:
Fine, let's say we did that, it's *still* racy because at time 3, 
the guest may hot remove cpu 2 on it's own since the guests VCPUs 
get to run in parallel to the monitor.


A guest can't hotremove a vcpu.  It may offline a vcpu, but that's 
not the same.


Obviously, if both the guest and the management application can 
initiate the same action, then there will be races.  But I don't 
think that's how things should be -- the guest should request a vcpu 
to be removed (or added), management thinks and files forms in 
triplicate, then hotadds or hotremoves the vcpu (most likely after it 
is no longer needed).


With the proper beaurocracy, there is no race.


You still have the same basic problem:

time 0: (qemu) notify-enable vnc-events
time 1: (qemu) foo 
time 4: 
time 5: notification: client connected

time 0: vnc client connects
time 2: vnc client disconnects

At time 5, when the management app gets the notification, it cannot 
make any assumptions about the state of the system.  You still need 
timestamps.


You don't even need the foo  to trigger this, qemu->user 
traffic can be arbitrarily delayed (I don't think we should hold 
notifications on partial input anyway).  But there's no race here.


The notification at time 5 means that the connect happened sometime 
before time 5, and that it may not be true now.  The user cannot assume 
anything.  A race can only happen against something the user initiated.


Suppose we're implementing some kind of single sign on:


(qemu) notify vnc on

... time passes, we want to allow members of group x to log in

(qemu) vnc_set_acl group:x
OK
(qemu)
notification: vnc connect aliguori
(qemu)

with a single monitor, we can be sure that the connect happened the 
vnc_set_acl.  If the notification arrives on a different session, we 
have no way of knowing that.




And even if you somehow eliminate the issue around masking 
notifications, you still have socket buffering that introduces the 
same problem.


If you have one monitor, the problem is much simpler, since events 
travelling in the same direction (command acknowledge and a 
notification) cannot be reordered.  With a command+wait, the problem 
is inherent.


Command acknowledge is not an event.  Events are out-of-band.  Command 
completions are in-band.  Right now, they are synchronous and


That's all correct, but I don't see how that changes anything.

I expect that in the short term future, we'll have a non-human monitor 
mode that allows commands to be asynchronous.


Then let's defer this until then?  'wait' is not useful for humans, they 
won't be retyping 'wait' every time something happens.




However, it's a mistake to muddle the distinction between an in-band 
completion and an out-of-band event.  You cannot relate the 
out-of-band events commands.


I can, if one happens before the other, and I have a single stream of 
command completions and event notifications.






The best you can do is stick a time stamp on a notification and make 
sure the management tool understands that the notification is 
reflectively of the state when the event happened, not of the 
current state.  


Timestamps are really bad.   They don't work at all if the management 
application is not on the same host.  They work badly if it is on the 
same host, since commands and events will be timestamped at different 
processes.


Timestamps are relative, not absolutely.  They should not be used to 
associate anything with the outside world.  In fact, I have no problem 
making the timestamps relative to QEMU startup just to ensure that 
noone tries to do something silly like associate notification 
timestamps with system time.


Dunno, seems totally artificial to me to have to introduce timestamps to 
compensate for different delays in multiple sockets that we introduced 
five patches earlier.


Please, let's keep this simple.



FWIW, this problem is not at all unique to QEMU and is generally 
true of most protocols that support an out-of-band notification 
mechanism.




command+wait makes it worse.  Let's stick with established practice.


What's the established practice?  Do you know of any protocol that is 
line based that does notifications like this?


I guess most MUDs?



IMAP IDLE is pretty close to "wait-forever".


IMAP IDLE can be terminated by the client, and so does not require 
multiple sessions (though IMAP supports them).




--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)

2009-04-09 Thread Avi Kivity


Anthony Liguori wrote:

Avi Kivity wrote:
Suppose you have a command which changes the meaning of a 
notification.  If a notification arrives before the command 
completion, then it happened before the command was executed.


If you want to make that reliable, you cannot have multiple monitors.  


Right.

Since you can mask notifications, there can be an arbitrarily long 
time between notification and the event happening.  Socket buffering 
presents the same problem.  Image:


Monitor 1:
time 0: (qemu) hotadd_cpu 2
time 1: (qemu) hello world 
time 5: 
time 6: notification: cpu 2 added
time 6: (qemu)

Monitor 2:
time 3: (qemu) hotremove_cpu 2
time 4: (qemu)
time 5: notification: cpu 2 removed
time 6: (qemu)

So to eliminate this, you have to ban multiple monitors.  


Well, not ban multiple monitors, but require that for non-racy operation 
commands and notifications be on the same session.


We can still debug on our dev-only monitor.

Fine, let's say we did that, it's *still* racy because at time 3, the 
guest may hot remove cpu 2 on it's own since the guests VCPUs get to 
run in parallel to the monitor.


A guest can't hotremove a vcpu.  It may offline a vcpu, but that's not 
the same.


Obviously, if both the guest and the management application can initiate 
the same action, then there will be races.  But I don't think that's how 
things should be -- the guest should request a vcpu to be removed (or 
added), management thinks and files forms in triplicate, then hotadds or 
hotremoves the vcpu (most likely after it is no longer needed).


With the proper beaurocracy, there is no race.



And even if you somehow eliminate the issue around masking 
notifications, you still have socket buffering that introduces the 
same problem.


If you have one monitor, the problem is much simpler, since events 
travelling in the same direction (command acknowledge and a 
notification) cannot be reordered.  With a command+wait, the problem is 
inherent.




The best you can do is stick a time stamp on a notification and make 
sure the management tool understands that the notification is 
reflectively of the state when the event happened, not of the current 
state.  


Timestamps are really bad.   They don't work at all if the management 
application is not on the same host.  They work badly if it is on the 
same host, since commands and events will be timestamped at different 
processes.


FWIW, this problem is not at all unique to QEMU and is generally true 
of most protocols that support an out-of-band notification mechanism.




command+wait makes it worse.  Let's stick with established practice.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)

2009-04-09 Thread Avi Kivity


Jan Kiszka wrote:

I'm not sure I understand your questions.  Multiple monitor sessions are
like multiple shell sessions.  I don't think a control program should
use more than one session, but it should allow a developer to connect to
issue 'info registers' and 'x/20i' commands.  Of course if a developer
issues 'quit' or a hotunplug command, things will break.



We agree if we want decoupled states of the monitor sessions (one
session should definitely not be used to configure the output of another
one). But I see no issues with collecting the events in one session that
happen to be caused by activity in some other session. But maybe I'm
missing your point.
  


The management application will still think the device is plugged in, 
and things will break if it isn't.


Of course if you asked for notification X on session Y, then event X 
should be delivered to session Y no matter how it originated (but not to 
session Z).


  

Please no more async notifications to the monitors. They are just ugly
to parse, at least for us humans. I don't want to see any notification
in the middle of my half-typed command e.g.
  
  

If we can identify an interactive session, we might redraw the partial
command after the prompt.



Uhh, please not this kind of terminal reprinting. The terminal user keep
full control over when things can be printed.
  


Very well.  I guess a human user can open another session for 
notifications, if they are so inclined.


  

btw, why would a human enable notifications?  Note notifications enabled
on the management session will only be displayed there.



It's true that the common use case for events will be monitor
applications. Still, enabling them for testing or simple scripting
should not switch on ugly output mode or complicate the parsing.
  


Fair enough.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)

2009-04-09 Thread Avi Kivity


Anthony Liguori wrote:

Avi Kivity wrote:

I'm sorry, I don't see why.  It's just like a shell session.

Compare with:

Monitor 1:
(qemu) notify enospace on
(qemu) notify vnc-connect on
(qemu) notify migration-completion on
(qemu) migrate -d ...
(qemu) migrate_cancel
(qemu) migrate -d ...


Monitor 2:
(qemu) wait
vnc connection ...
(qemu) wait
enospc on ide0-0
(qemu) wait
migration cancelled
(qemu) wait
notification: migration completed

There is no way to tell by looking what has happened (well, in this 
case you can, but in the general case you cannot).  You have to look 
at two separate interactive sessions (ctrl-alt-2 ctrl-alt-3 
ctrl-alt-3).  You have to keep reissuing the wait command.  Oh, and 
it's racy, so if you're interested in what really happens you have to 
issue info commands on session 1.


How is it less racy?



Suppose you have a command which changes the meaning of a notification.  
If a notification arrives before the command completion, then it 
happened before the command was executed.  If it arrives after command 
completion, then it happened after the command was executed.


Oh.  If the command generates no output (like most), you can't tell when 
it completes.  I suppose we could have qemu print OK after completing a 
command.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)

2009-04-09 Thread Avi Kivity


Jan Kiszka wrote:

Avi Kivity wrote:
  

Gerd Hoffmann wrote:


On 04/09/09 16:03, Avi Kivity wrote:
  

I don't want multiplexed monitor sessions, at all.


I'm very happy to finally see them.  Finally one can run vms with
libvirt and *still* access the monitor for debugging and development
purposes.

  

Right, I like them for that purpose as well.  But not for ordinary control.



How do you want to differentiate? What further complications would this
bring us?
  


I'm not sure I understand your questions.  Multiple monitor sessions are 
like multiple shell sessions.  I don't think a control program should 
use more than one session, but it should allow a developer to connect to 
issue 'info registers' and 'x/20i' commands.  Of course if a developer 
issues 'quit' or a hotunplug command, things will break.





Please no more async notifications to the monitors. They are just ugly
to parse, at least for us humans. I don't want to see any notification
in the middle of my half-typed command e.g.
  


If we can identify an interactive session, we might redraw the partial 
command after the prompt.


btw, why would a human enable notifications?  Note notifications enabled 
on the management session will only be displayed there.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)

2009-04-09 Thread Avi Kivity


Anthony Liguori wrote:

Avi Kivity wrote:


I'd make everything line-oriented.  Anything from the user up to \n 
is buffered and ignored until the \n arrives.


Once the \n arrives, the command is acted upon atomically, either 
completing fully or launching an async notification.


So the rules are: whenever the monitor is idle, a notification can be 
printed out.


So by idle, you mean, the end of the output buffer ends in either '\n' 
or '\n(qemu) '.  The input buffer must also be empty.


You don't have to look any buffers.  If the monitor is processing a 
command, it is busy.  An asynchronous command ('migrate -d') is not 
processed in the monitor after it is launched, so it doesn't keep the 
monitor busy.  A monitor enters idle after printing the prompt, and 
leaves idle when it starts processing a command.


If you meant from the user side, a notification always follows the prompt.




(qemu) notify enospace on
(qemu) notify vnc-connect on
(qemu)
notification: vnc connection ...
(qemu) notify migration-completion on
(qemu) migrate -d ...
notification: enospc on ide0-0
(qemu) migrate_cancel
notification: migration cancelled
(qemu) migrate -d ...
(qemu)
notification: migration completed 


This hurts my eyes.  It's not human readable.


I'm sorry, I don't see why.  It's just like a shell session.

Compare with:

Monitor 1:
(qemu) notify enospace on
(qemu) notify vnc-connect on
(qemu) notify migration-completion on
(qemu) migrate -d ...
(qemu) migrate_cancel
(qemu) migrate -d ...


Monitor 2:
(qemu) wait
vnc connection ...
(qemu) wait
enospc on ide0-0
(qemu) wait
migration cancelled
(qemu) wait
notification: migration completed

There is no way to tell by looking what has happened (well, in this case 
you can, but in the general case you cannot).  You have to look at two 
separate interactive sessions (ctrl-alt-2 ctrl-alt-3 ctrl-alt-3).  You 
have to keep reissuing the wait command.  Oh, and it's racy, so if 
you're interested in what really happens you have to issue info commands 
on session 1.


That's unusable.

  If we're going to do this, we might as well have a non-human mode 
which would oddly enough be more human readable.  If you do this, then 
your session looks an awful lot like my session from a previous note.


I think we should.



I think the thing that is missing is that the 'wait' command does not 
have to be part of the non-human mode.  In non-human mode, you are 
always doing an implicit wait.




I think 'wait' is unusable for humans.  If I want qemu to tell me 
something happened, it's enough to enable notifications.  There's no 
need to tell it to wait every time something happens.  That's poll(2), 
there's no poll(1).


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] Re: [Qemu-devel] [PATCH 1/6] Allow multiple monitor devices (v2)

2009-04-09 Thread Avi Kivity


Anthony Liguori wrote:

Avi Kivity wrote:


(qemu) notify enospace on
(qemu) notify vnc-connect on
(qemu)
notification: vnc connection ...
(qemu) notify migration-completion on
(qemu) migrate -d ...
notification: enospc on ide0-0
(qemu) migrate_cancel
notification: migration cancelled
(qemu) migrate -d ...
(qemu)
notification: migration completed


What are the rules for printing out 'notification'?  Do you want for 
the end of the buffer to be '\n' or '\n(qemu )'?  If so, if I type:


(qemu) f

But don't hit enter, would that suppress notification?



I'd make everything line-oriented.  Anything from the user up to \n is 
buffered and ignored until the \n arrives.


Once the \n arrives, the command is acted upon atomically, either 
completing fully or launching an async notification.


So the rules are: whenever the monitor is idle, a notification can be 
printed out.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

1 2 >

1 - 100 of 110 matches

Mail list logo