Re: [vdsm] RFC: New Storage API
- Original Message - > From: "Shu Ming" > To: "Deepak C Shetty" > Cc: "Saggi Mizrahi" , "engine-devel" > , "VDSM Project Development" > > Sent: Friday, December 7, 2012 1:37:20 AM > Subject: Re: [vdsm] RFC: New Storage API > > 于 2012-12-7 13:23, Deepak C Shetty: > > On 12/06/2012 10:22 PM, Saggi Mizrahi wrote: > >> > >> - Original Message - > >>> From: "Shu Ming" > >>> To: "Saggi Mizrahi" > >>> Cc: "VDSM Project Development" > >>> , > >>> "engine-devel" > >>> Sent: Thursday, December 6, 2012 11:02:02 AM > >>> Subject: Re: [vdsm] RFC: New Storage API > >>> > >>> Saggi, > >>> > >>> Thanks for sharing your thought and I get some comments below. > >>> > >>> > >>> Saggi Mizrahi: > I've been throwing a lot of bits out about the new storage API > and > I think it's time to talk a bit. > I will purposefully try and keep implementation details away and > concentrate about how the API looks and how you use it. > > First major change is in terminology, there is no long a storage > domain but a storage repository. > This change is done because so many things are already called > domain in the system and this will make things less confusing > for > new-commers with a libvirt background. > > One other changes is that repositories no longer have a UUID. > The UUID was only used in the pool members manifest and is no > longer needed. > > > connectStorageRepository(repoId, repoFormat, > connectionParameters={}): > repoId - is a transient name that will be used to refer to the > connected domain, it is not persisted and doesn't have to be the > same across the cluster. > repoFormat - Similar to what used to be type (eg. localfs-1.0, > nfs-3.4, clvm-1.2). > connectionParameters - This is format specific and will used to > tell VDSM how to connect to the repo. > >>> > >>> Where does repoID come from? I think repoID doesn't exist before > >>> connectStorageRepository() return. Isn't repoID a return value > >>> of > >>> connectStorageRepository()? > >> No, repoIDs are no longer part of the domain, they are just a > >> transient handle. > >> The user can put whatever it wants there as long as it isn't > >> already > >> taken by another currently connected domain. > > > > So what happens when user mistakenly gives a repoID that is in use > > before.. there should be something in the return value that > > specifies > > the error and/or reason for error so that user can try with a > > new/diff > > repoID ? > > I think let the user to give the repoID is meaningless and > error-prune. The repo ID is meaningless, it's just a handle to the instance. It's never persisted to disk and doesn't have to be unique across the cluster. > Developer must maintain a a unique ID list for every storage > repository > connected. Why? You could just use repoId = ___ "ovirt_example.com_hosting_3" as an example If you agree with all other users of the same VDSM instance that you are going to use this scheme you can cooperate. You can use whatever scheme you want to make sure you don't hit anyone else's. The point is, VDSM doesn't care how you provision repoIDs and how unique they are across the cluster. It's the user's choice how and if to persist this information. > > > > disconnectStorageRepository(self, repoId) > > > In the new API there are only images, some images are mutable > and > some are not. > mutable images are also called VirtualDisks > immutable images are also called Snapshots > > There are no explicit templates, you can create as many images > as > you want from any snapshot. > > There are 4 major image operations: > > > createVirtualDisk(targetRepoId, size, baseSnapshotId=None, > userData={}, options={}): > > targetRepoId - ID of a connected repo where the disk will be > created > size - The size of the image you wish to create > baseSnapshotId - the ID of the snapshot you want the base the > new > virtual disk on > userData - optional data that will be attached to the new VD, > could > be anything that the user desires. > options - options to modify VDSMs default behavior > > > > IIUC, i can use options to do storage offloads ? For eg. I can > > create > > a LUN that represents this VD on my storage array based on the > > 'options' parameter ? Is this the intended way to use 'options' ? > > > > returns the id of the new VD > >>> I think we will also need a function to check if a a VirtualDisk > >>> is > >>> based on a specific snapshot. > >>> Like: isSnapshotOf(virtualDiskId, baseSnapshotID): > >> No, the design is that volume dependencies are an implementation > >> detail. > >> There is no reason for you to know that an image is physically a > >> snapshot of another. > >> Logical snapshots, template
Re: [vdsm] RFC: New Storage API
- Original Message - > From: "Deepak C Shetty" > To: "Saggi Mizrahi" > Cc: "Shu Ming" , "engine-devel" > , "VDSM Project Development" > , "Deepak C Shetty" > > Sent: Friday, December 7, 2012 12:23:15 AM > Subject: Re: [vdsm] RFC: New Storage API > > On 12/06/2012 10:22 PM, Saggi Mizrahi wrote: > > > > - Original Message - > >> From: "Shu Ming" > >> To: "Saggi Mizrahi" > >> Cc: "VDSM Project Development" > >> , "engine-devel" > >> > >> Sent: Thursday, December 6, 2012 11:02:02 AM > >> Subject: Re: [vdsm] RFC: New Storage API > >> > >> Saggi, > >> > >> Thanks for sharing your thought and I get some comments below. > >> > >> > >> Saggi Mizrahi: > >>> I've been throwing a lot of bits out about the new storage API > >>> and > >>> I think it's time to talk a bit. > >>> I will purposefully try and keep implementation details away and > >>> concentrate about how the API looks and how you use it. > >>> > >>> First major change is in terminology, there is no long a storage > >>> domain but a storage repository. > >>> This change is done because so many things are already called > >>> domain in the system and this will make things less confusing for > >>> new-commers with a libvirt background. > >>> > >>> One other changes is that repositories no longer have a UUID. > >>> The UUID was only used in the pool members manifest and is no > >>> longer needed. > >>> > >>> > >>> connectStorageRepository(repoId, repoFormat, > >>> connectionParameters={}): > >>> repoId - is a transient name that will be used to refer to the > >>> connected domain, it is not persisted and doesn't have to be the > >>> same across the cluster. > >>> repoFormat - Similar to what used to be type (eg. localfs-1.0, > >>> nfs-3.4, clvm-1.2). > >>> connectionParameters - This is format specific and will used to > >>> tell VDSM how to connect to the repo. > >> > >> Where does repoID come from? I think repoID doesn't exist before > >> connectStorageRepository() return. Isn't repoID a return value of > >> connectStorageRepository()? > > No, repoIDs are no longer part of the domain, they are just a > > transient handle. > > The user can put whatever it wants there as long as it isn't > > already taken by another currently connected domain. > > So what happens when user mistakenly gives a repoID that is in use > before.. there should be something in the return value that specifies > the error and/or reason for error so that user can try with a > new/diff > repoID ? Asi I said, connect fails if the repoId is in use ATM. > > >>> disconnectStorageRepository(self, repoId) > >>> > >>> > >>> In the new API there are only images, some images are mutable and > >>> some are not. > >>> mutable images are also called VirtualDisks > >>> immutable images are also called Snapshots > >>> > >>> There are no explicit templates, you can create as many images as > >>> you want from any snapshot. > >>> > >>> There are 4 major image operations: > >>> > >>> > >>> createVirtualDisk(targetRepoId, size, baseSnapshotId=None, > >>> userData={}, options={}): > >>> > >>> targetRepoId - ID of a connected repo where the disk will be > >>> created > >>> size - The size of the image you wish to create > >>> baseSnapshotId - the ID of the snapshot you want the base the new > >>> virtual disk on > >>> userData - optional data that will be attached to the new VD, > >>> could > >>> be anything that the user desires. > >>> options - options to modify VDSMs default behavior > > IIUC, i can use options to do storage offloads ? For eg. I can create > a > LUN that represents this VD on my storage array based on the > 'options' > parameter ? Is this the intended way to use 'options' ? No, this has nothing to do with offloads. If by "offloads" you mean having other VDSM hosts to the heavy lifting then this is what the option autoFix=False and the fix mechanism is for. If you are talking about advanced scsi features (ie. write same) they will be used automatically whenever possible. In any case, how we manage LUNs (if they are even used) is an implementation detail. > > >>> > >>> returns the id of the new VD > >> I think we will also need a function to check if a a VirtualDisk > >> is > >> based on a specific snapshot. > >> Like: isSnapshotOf(virtualDiskId, baseSnapshotID): > > No, the design is that volume dependencies are an implementation > > detail. > > There is no reason for you to know that an image is physically a > > snapshot of another. > > Logical snapshots, template information, and any other information > > can be set by the user by using the userData field available for > > every image. > >>> createSnapshot(targetRepoId, baseVirtualDiskId, > >>> userData={}, options={}): > >>> targetRepoId - The ID of a connected repo where the new sanpshot > >>> will be created and the original image exists as well. > >>> size - The size of the image you wish to create > >>> baseVirtualDisk - the ID of a mutable image (Virtual Disk)
Re: [vdsm] moving the collection of statistics to external process
On 12/07/2012 12:39 PM, Mark Wu wrote: On 12/06/2012 11:29 PM, Adam Litke wrote: On Thu, Dec 06, 2012 at 11:19:34PM +0800, Shu Ming wrote: 于 2012-12-6 4:51, Itamar Heim 写道: On 12/05/2012 10:33 PM, Adam Litke wrote: On Wed, Dec 05, 2012 at 10:21:39PM +0200, Itamar Heim wrote: On 12/05/2012 10:16 PM, Adam Litke wrote: On Wed, Dec 05, 2012 at 09:01:24PM +0200, Itamar Heim wrote: On 12/05/2012 08:57 PM, Adam Litke wrote: On Wed, Dec 05, 2012 at 08:30:10PM +0200, Itamar Heim wrote: On 12/05/2012 04:42 PM, Adam Litke wrote: I wanted to know what do you think about it and if you have better solution to avoid initiate so many threads? And if splitting vdsm is a good idea here? In first look, my opinion is that it can help and would be nice to have vmStatisticService that runs and writes to separate log the vms status. Vdsm recently started requiring the MOM package. MOM also performs some host and guest statistics collection as part of the policy framework. I think it would be a really good idea to consolidate all stats collection into MOM. Then, all stats become usable within the policy and by vdsm for its own internal purposes. Today, MOM has one stats collection thread per VM and one thread for the host stats. It has an API for gathering the most recently collected stats which vdsm can use. isn't this what collectd (and its libvirt plugin) or pcp are already doing? Lot's of things collect statistics, but as of right now, we're using MOM and we're not yet using collectd on the host, right? I think we should have a single stats collection service and clients for it. I think mom and vdsm should get their stats from that service, rather than have either beholden to any new stats something needs to collect. How would this work for collecting guest statistics? Would we require collectd to be installed in all guests running under oVirt? my understanding is collectd is installed on the host, and uses collects libvirt plugin to collect guests statistics? Yes, but some statistics can only be collected by making a call to the oVirt guest agent (eg. guest memory statistics). The logical next step would be to write a collectd plugin for ovirt-guest-agent, but vdsm owns the connections to the guest agents and probably does not want to multiplex those connections for many reasons (security being the main one). and some will come from qemu-ga which libvirt will support? maybe a collectd vdsm plugin for the guest agent stats? I am thinking to have the collectd as a stand alone service to collect the statics from both ovirt-guest and qemu-ga. Then collected can export the information to host proc file system in layered architecture. Then mom or other vdsm service can get the information from the proc file system like other OS statics exported in the host. You wouldn't use the host /proc filesystem for this purpose. /proc is an interface between userspace and the kernel. It is not for direct application use. The problem I see with hooking collectd up to ovirt-ga is that vdsm still needs a connection to ovirt-ga for things like shutdown and desktopLogin. Today vdsm, owns the connection to the guest agent and there is not a nice way to multiplex that connection for use by multiple clients simultaneously. /home/tlv/iheim/workspace Actually, I don't like to collect from statistics from guest agent. Now libvirt can provide the statistics of vcpu, block and network interface. So I think we should reconsider enabling guest memory report in virtio balloon driver. I am not sure if async event is supported in qmp now. How do you think of it? In vdsm and mom, we don't just simply collect statistics, but also need perform appropriate action on it. So probably we still need a output plugin for collectd to to make the data is available to vdsm and mom, and generate an event to vdsm or mom when the data reaches a given threshold. Just an idea. I am not sure how easy to implement it. should be easy for such stats, question is what other items are reported by the current guest agent (say, list of installed applications). ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] moving the collection of statistics to external process
On 12/06/2012 11:29 PM, Adam Litke wrote: On Thu, Dec 06, 2012 at 11:19:34PM +0800, Shu Ming wrote: 于 2012-12-6 4:51, Itamar Heim 写道: On 12/05/2012 10:33 PM, Adam Litke wrote: On Wed, Dec 05, 2012 at 10:21:39PM +0200, Itamar Heim wrote: On 12/05/2012 10:16 PM, Adam Litke wrote: On Wed, Dec 05, 2012 at 09:01:24PM +0200, Itamar Heim wrote: On 12/05/2012 08:57 PM, Adam Litke wrote: On Wed, Dec 05, 2012 at 08:30:10PM +0200, Itamar Heim wrote: On 12/05/2012 04:42 PM, Adam Litke wrote: I wanted to know what do you think about it and if you have better solution to avoid initiate so many threads? And if splitting vdsm is a good idea here? In first look, my opinion is that it can help and would be nice to have vmStatisticService that runs and writes to separate log the vms status. Vdsm recently started requiring the MOM package. MOM also performs some host and guest statistics collection as part of the policy framework. I think it would be a really good idea to consolidate all stats collection into MOM. Then, all stats become usable within the policy and by vdsm for its own internal purposes. Today, MOM has one stats collection thread per VM and one thread for the host stats. It has an API for gathering the most recently collected stats which vdsm can use. isn't this what collectd (and its libvirt plugin) or pcp are already doing? Lot's of things collect statistics, but as of right now, we're using MOM and we're not yet using collectd on the host, right? I think we should have a single stats collection service and clients for it. I think mom and vdsm should get their stats from that service, rather than have either beholden to any new stats something needs to collect. How would this work for collecting guest statistics? Would we require collectd to be installed in all guests running under oVirt? my understanding is collectd is installed on the host, and uses collects libvirt plugin to collect guests statistics? Yes, but some statistics can only be collected by making a call to the oVirt guest agent (eg. guest memory statistics). The logical next step would be to write a collectd plugin for ovirt-guest-agent, but vdsm owns the connections to the guest agents and probably does not want to multiplex those connections for many reasons (security being the main one). and some will come from qemu-ga which libvirt will support? maybe a collectd vdsm plugin for the guest agent stats? I am thinking to have the collectd as a stand alone service to collect the statics from both ovirt-guest and qemu-ga. Then collected can export the information to host proc file system in layered architecture. Then mom or other vdsm service can get the information from the proc file system like other OS statics exported in the host. You wouldn't use the host /proc filesystem for this purpose. /proc is an interface between userspace and the kernel. It is not for direct application use. The problem I see with hooking collectd up to ovirt-ga is that vdsm still needs a connection to ovirt-ga for things like shutdown and desktopLogin. Today vdsm, owns the connection to the guest agent and there is not a nice way to multiplex that connection for use by multiple clients simultaneously. Actually, I don't like to collect from statistics from guest agent. Now libvirt can provide the statistics of vcpu, block and network interface. So I think we should reconsider enabling guest memory report in virtio balloon driver. I am not sure if async event is supported in qmp now. How do you think of it? In vdsm and mom, we don't just simply collect statistics, but also need perform appropriate action on it. So probably we still need a output plugin for collectd to to make the data is available to vdsm and mom, and generate an event to vdsm or mom when the data reaches a given threshold. Just an idea. I am not sure how easy to implement it. ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] moving the collection of statistics to external process
On 12/05/2012 10:23 PM, ybronhei wrote: As part of an issue that if you push start for 200vms in the same time it takes hours because undefined issue, we thought about moving the collection of statistics outside vdsm. It can help because the stat collection is an internal threads of vdsm that can spend not a bit of a time, I'm not sure if it would help with the issue of starting many vms simultaneously, but it might improve vdsm response. Currently we start thread for each vm and then collecting stats on them in constant intervals, and it must effect vdsm if we have 200 thread like this that can take some time. for example if we have connection errors to storage and we can't receive its response, all the 200 threads can get stuck and lock other threads (gil issue). As far as I know, the design of oop is try to resolve the problem you state. However, I don't understand how GIL can cause this problem? Python should release GIL before executing any I/O involved instruction. I did some tests before and found the other threads can continue to run while one thread get stuck on I/O. I wanted to know what do you think about it and if you have better solution to avoid initiate so many threads? And if splitting vdsm is a good idea here? In first look, my opinion is that it can help and would be nice to have vmStatisticService that runs and writes to separate log the vms status. The problem with this solution is that if those interval functions needs to communicate with internal parts of vdsm to set values or start internal processes when something has changed, it depends on the stat function.. and I'm not sure that stat function should control Asinternal flows. Today to recognize connectivity error we count on this method, but we can add polling mechanics for those issues (which can raise same problems we are trying to deal with..) I would like to here your ideas and comments.. thanks ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel
Re: [vdsm] Fedora, udev and nic renaming
On Thu, Dec 06, 2012 at 01:12:52PM +0200, Michael S. Tsirkin wrote: > On Wed, Dec 05, 2012 at 03:51:39PM +0200, Dan Kenigsberg wrote: > > On Tue, Dec 04, 2012 at 05:25:48AM -0500, Alon Bar-Lev wrote: > > > > > > Thanks for this verbose description. > > > > > > I don't think using libguestfs is the solution for this. > > > > Yeah, it seems like a hack that would be quite hard to maintain for all > > supported guest operating systems. > > > > > > > > Fixing qemu to accept BIOS interface name at -net parameter is > > > preferable. I don't think we should expose the interface a PCI device as > > > it will have some drawbacks, but attempt to use the onboard convention. > > > > I don't see a real use case for setting the bios name explicitly. After > > all, libvirt/vdsm/Engine is going to to allocate them according to their > > relative order. I'd be content with qemu providing a sane, reproducible, > > biosdevname for each nic. > > > > Michael, would it be difficult to have? > > > This is not a qemu issue. This is a biosdevname/VMware issue. > biodevname has this code: > > /* > Algorithm suggested by: > > http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458 > */ > > static int > running_in_virtual_machine (void) > { > u_int32_t eax=1U, ecx=0U; > > ecx = cpuid (eax, ecx); > if (ecx & 0x8000U) >return 1; > return 0; > } > > So it just looks for a hypervisor. > > It should look at the hypervisor leaf > and either blacklist vmware specifically or whitelist kvm. > > Please open (preferably urgent prio) bugzilla for biosdevname component > so we can fix it in F18, cc me. > I can write you a patch but maintainer needs to apply it. Thanks for the analysis, Michael. Fedora bug opened: Bug 884990 - non deterministic bios dev naming in KVM guests ___ vdsm-devel mailing list vdsm-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/vdsm-devel