On Wed, Jun 19, 2013 at 3:31 PM, Dimitris Aragiorgis <[email protected]> wrote: > [+ lists] > > Hi Guido, > > * Guido Trotter <[email protected]> [2013-06-18 13:40:04 +0200]: > >> On Sat, May 25, 2013 at 6:41 PM, Dimitris Aragiorgis <[email protected]> wrote: >> > * Dimitris Aragiorgis <[email protected]> [2013-05-25 12:06:51 +0300]: >> > >> >> > > > >> >> > > > How do we guarantee that no two devices will have the same name? We >> >> > > should >> >> > > > double check this, to be sure there are no conflicts. >> >> > > > >> >> > > > >> >> > > >> >> > > Well we actually don't. Trimming system's generated uuid rises our >> >> > > chances >> >> > > for duplicate IDs. If there is a way to generate a uuid of max 32 >> >> > > chars >> >> > > starting with a letter based on the uuids of type >> >> > > /proc/sys/kernel/random/uuid >> >> > > then it would be great. Until we find one, we could disable hotplug >> >> > > for >> >> > > devices >> >> > > that have the uuid prefix. >> >> > > >> >> > >> >> > That's going to be a problem, as it will generate errors for no reason. >> >> > (sorry, you can't hotplug your device, because you were unlucky in the >> >> > uuid >> >> > roulette. please try again being luckier) >> >> > >> >> > How about using some information that is specific to the instance and >> >> > can't >> >> > change? eg. >> >> > >> >> > x<pci-id>_<start-of-uuid> >> >> > or >> >> >> >> Well it seems that if we obtain the device pci slot before naming it then >> >> we >> >> could definitely use it as an identifier. To this end, UUIDs are not a >> >> prereq. >> >> And it seems to be doable. Good point. >> >> >> > >> > >> > Well after second thoughts and code digging, I think the device naming >> > should derive from uuid only. That is because when we remove a device >> > via an rpc the object that is passed to the hypervisor does not have any >> > pci details (because master is unaware of hypervisor specific info). It >> > must be a deterministic way to produce a unique name that remains >> > unchanged (the uuid, for example, does not change during device life)... >> > >> >> But the hypervisor then knows (and can dig up) the pci id, no? >> I am a bit worried with the uuid only, due to the duplication problem. >> > > The thing is that ganeti keeps track a list of devices. With hotplug > support the order of that list may not be identical with the one inside > the VM. E.g.: > We have an instance with [nic0, nic1, nic2]. > It gets booted with the same device order [pci0, pci1, pci2]. > He hot-remove nic1 so we have [nic0, nic2] and [pci0, pci2]. > Then we hot-add another device and get [nic0, nic2, nic3] and > [pci0, pci3, pci2] because nic3 gets placed in nic1's pci slot > (the first available).
Good. > > The hypervisor can only find a device based (a) on its name (new style > hotplug command like netdev_add, etc) or (b) based on its pci slot (old > style commands like pci_add, etc.). Ack. Question: what happens to devices that are "already there"? Can they be found somehow, if they don't have a name set (I suppose on the kvm command line). Is there a way to set the name "post facto", and/or to read the pci information from kvm without the reboot. (I would like to avoid the reboot requirement, in particular for migration: it's ok to say "you haven't rebooted, so no hotplug", but not "you haven't rebooted, so no migrate" since migrate worked before: can we have this? > > I have gone with (a) so we have to find a way to get a unique device name > that will be used by kvm. I have used the first part of UUID which is not > enough with your sayings. > > Runtime files are used to keep track the configuration of a running > instance. It means that the device lists that master knows are not the > same with the ones kept in runtime files. > > In current design runtime file updates takes place after a successful > hotplug action. In case of hot-add, we ask the hypervisor for the first > available pci slot, we name the device after its uuid, we hot-add the device, > and we append the newly added device to the existing list in runtime file. > > In case of hot-remove we get the name of the device (from its uuid), > hot-remove it, and then parse the existing list in runtime file and > remove the device with the corresponding uuid. > > In my opinion this scenario is more elegant but we have the duplication > corner-case, which one must be sooooo unlucky to trigger.... > > > This can change slightly and we can use the device's pci slot for its name! > So I propose the following: > > In case of hot-add, we ask the hypervisor for the first > available pci slot, we name the device (something like nic-pci-<slot>), we > hot-add > it based on that name and then we append it to the corresponding list inside > runtime file. > Can we have nic-<uuid>-<slot>? (where uuid is only part of the uuid, of course). This should help with any bug targeting the wrong device: since the uuid is in the config, if the uuid doesn't match, there's a problem. > In case of hot-remove, we parse the runtime list, find the device based on > it uuid, we get its pci slot and hot-remove it based on its name > (nic-pci-<slot>). > Then we parse the list in runtime file and find the device the entry based on > its pci and remove it. > > With other words we save the pci slot of each device in runtime files and > retrieve > it from there when needed. Ack, but let's have part of the UUID in the name for extra security (this way a device can never be associated with a wrong pci slot by mistake). > > In case you do not want to add another slot in NIC and Disk objects this > info could be stored as a tupple of (NIC.ToDict, <pci_slot>) but I think > the code will be more complicated.. Additionally because we would > refer to devices based on their uuids, the lists should be changed to dicts > with uuids as keys. > > What do you think about all that? > Let's say we add the slot, then, but we need to guarantee that in config.py it's *never* populated. Aka config.py should wipe it out every time it's saving the config, or getting an updated object to write, so that it can never leak there. >> >> > x<arbitrary-integer>_<start-of-uuid> (where the arbitrary integer is >> >> > just >> >> > used for deduplication) >> >> > >> >> > >> > >> > Arbitrary integer cannot be used because device name should be anytime >> > reproduce-able. >> > >> >> Yes, unless you have them saved or readable. >> >> >> >> >> OK. Sasha's feedback is more than helpful. I'll try to address the issues >> >> that >> >> are raised, but still I think that those could extend the initial >> >> implementation >> >> discussed here and do not affect the basic design. >> >> >> >> >> >> To sum up: >> >> >> >> 1) pci slot of each device must be saved in backend for the sake of >> >> migrations. Should this be a slot in NIC, Disk objects or should it be >> >> saved as a tupple in serialized data? >> >> >> >> Again, are you sure it can't just be taken from kvm on the source node >> before migration, instead of saving it? >> > > No you cannot. See above. It must be saved somewhere.. > >> >> 2) if we want disk hotplug, the format of runtime files should change. >> >> block_devices >> >> should be treated separately, just like NICs. Migrating old instances >> >> without first >> >> rebooting them still remains an issue. >> >> >> >> Still not entirely sure here. But let's discuss the above point first. >> > > We definitely have to split disks from the cmd part. See above. > >> >> 3) merge all rpc calls in one hotplug_device(device, action) >> >> Ack. >> >> >> >> >> 4) do not include disk resizing in first implementation. >> >> >> >> Ack. >> >> >> 5) add hv param for enabling hotplug or not >> >> Ack. >> >> >> 6) warnings in case of disk/nic live removal >> >> >> >> Ack. >> >> >> 7) warnings in case hotplug is not supported but --hotplug option is >> >> passed >> >> >> >> Ack. >> > > OK with the ACKs > >> >> 8) address Sasha's concerns: >> >> * Parse kvm help output to see if hotplug is supported. >> >> * Dig into the case where an instance does not run as a root. >> >> >> >> Definitely. >> >> >> >> > 9) address the uuid(or something)-to-kvm_device_name deterministic >> > function. >> > >> >> Yes, this one worries me a bit. I would be OK with "start-of-uuid" as >> long as the start of uuid is guaranteed (elsewhere) to be unique. >> > > The new design (see above) address it adequately. The deterministic function > can be <device type>-pci-<device slot>. > > > Looking forward to your comments. > Thanks a lot, Guido
