On Wed, Jun 19, 2013 at 3:31 PM, Dimitris Aragiorgis <[email protected]> wrote:
> [+ lists]
>
> Hi Guido,
>
> * Guido Trotter <[email protected]> [2013-06-18 13:40:04 +0200]:
>
>> On Sat, May 25, 2013 at 6:41 PM, Dimitris Aragiorgis <[email protected]> wrote:
>> > * Dimitris Aragiorgis <[email protected]> [2013-05-25 12:06:51 +0300]:
>> >
>> >> > > >
>> >> > > > How do we guarantee that no two devices will have the same name? We
>> >> > > should
>> >> > > > double check this, to be sure there are no conflicts.
>> >> > > >
>> >> > > >
>> >> > >
>> >> > > Well we actually don't. Trimming system's generated uuid rises our 
>> >> > > chances
>> >> > > for duplicate IDs. If there is a way to generate a uuid of max 32 
>> >> > > chars
>> >> > > starting with a letter based on the uuids of type
>> >> > > /proc/sys/kernel/random/uuid
>> >> > > then it would be great. Until we find one,  we could disable hotplug 
>> >> > > for
>> >> > > devices
>> >> > > that have the uuid prefix.
>> >> > >
>> >> >
>> >> > That's going to be a problem, as it will generate errors for no reason.
>> >> > (sorry, you can't hotplug your device, because you were unlucky in the 
>> >> > uuid
>> >> > roulette. please try again being luckier)
>> >> >
>> >> > How about using some information that is specific to the instance and 
>> >> > can't
>> >> > change? eg.
>> >> >
>> >> > x<pci-id>_<start-of-uuid>
>> >> > or
>> >>
>> >> Well it seems that if we obtain the device pci slot before naming it then 
>> >> we
>> >> could definitely use it as an identifier. To this end, UUIDs are not a 
>> >> prereq.
>> >> And it seems to be doable. Good point.
>> >>
>> >
>> >
>> > Well after second thoughts and code digging, I think the device naming
>> > should derive from uuid only. That is because when we remove a device
>> > via an rpc the object that is passed to the hypervisor does not have any
>> > pci details (because master is unaware of hypervisor specific info). It
>> > must be a deterministic way to produce a unique name that remains
>> > unchanged (the uuid, for example, does not change during device life)...
>> >
>>
>> But the hypervisor then knows (and can dig up) the pci id, no?
>> I am a bit worried with the uuid only, due to the duplication problem.
>>
>
> The thing is that ganeti keeps track a list of devices. With hotplug
> support the order of that list may not be identical with the one inside
> the VM.  E.g.:
> We have an instance with [nic0, nic1, nic2].
> It gets booted with the same device order [pci0, pci1, pci2].
> He hot-remove nic1 so we have [nic0, nic2] and [pci0, pci2].
> Then we hot-add another device and get [nic0, nic2, nic3] and
> [pci0, pci3, pci2] because nic3 gets placed in nic1's pci slot
> (the first available).

Good.

>
> The hypervisor can only find a device based (a) on its name (new style
> hotplug command like netdev_add, etc) or (b) based on its pci slot (old
> style commands like pci_add, etc.).

Ack. Question: what happens to devices that are "already there"? Can
they be found somehow, if they don't have a name set (I suppose on the
kvm command line). Is there a way to set the name "post facto", and/or
to read the pci information from kvm without the reboot. (I would like
to avoid the reboot requirement, in particular for migration: it's ok
to say "you haven't rebooted, so no hotplug", but not "you haven't
rebooted, so no migrate" since migrate worked before: can we have
this?

>
> I have gone with (a) so we have to find a way to get a unique device name
> that will be used by kvm. I have used the first part of UUID which is not
> enough with your sayings.
>
> Runtime files are used to keep track the configuration of a running
> instance.  It means that the device lists that master knows are not the
> same with the ones kept in runtime files.
>
> In current design runtime file updates takes place after a successful
> hotplug action.  In case of hot-add, we ask the hypervisor for the first
> available pci slot, we name the device after its uuid, we hot-add the device,
> and we append the newly added device to the existing list in runtime file.
>
> In case of hot-remove we get the name of the device (from its uuid),
> hot-remove it, and then parse the existing list in runtime file and
> remove the device with the corresponding uuid.
>
> In my opinion this scenario is more elegant but we have the duplication
> corner-case, which one must be sooooo unlucky to trigger....
>
>
> This can change slightly and we can use the device's pci slot for its name!
> So I propose the following:
>
> In case of hot-add, we ask the hypervisor for the first
> available pci slot, we name the device (something like nic-pci-<slot>), we 
> hot-add
> it based on that name and then we append it to the corresponding list inside
> runtime file.
>

Can we have nic-<uuid>-<slot>?
(where uuid is only part of the uuid, of course).
This should help with any bug targeting the wrong device: since the
uuid is in the config, if the uuid doesn't match, there's a problem.

> In case of hot-remove, we parse the runtime list, find the device based on
> it uuid, we get its pci slot and hot-remove it based on its name 
> (nic-pci-<slot>).
> Then we parse the list in runtime file and find the device the entry based on
> its pci and remove it.
>
> With other words we save the pci slot of each device in runtime files and 
> retrieve
> it from there when needed.

Ack, but let's have part of the UUID in the name for extra security
(this way a device can never be associated with a wrong pci slot by
mistake).

>
> In case you do not want to add another slot in NIC and Disk objects this
> info could be stored as a tupple of (NIC.ToDict, <pci_slot>) but I think
> the code will be more complicated.. Additionally because we would
> refer to devices based on their uuids, the lists should be changed to dicts
> with uuids as keys.
>
> What do you think about all that?
>

Let's say we add the slot, then, but we need to guarantee that in
config.py it's *never* populated.
Aka config.py should wipe it out every time it's saving the config, or
getting an updated object to write, so that it can never leak there.


>> >> > x<arbitrary-integer>_<start-of-uuid> (where the arbitrary integer is 
>> >> > just
>> >> > used for deduplication)
>> >> >
>> >> >
>> >
>> > Arbitrary integer cannot be used because device name should be anytime
>> > reproduce-able.
>> >
>>
>> Yes, unless you have them saved or readable.
>>
>> >>
>> >> OK. Sasha's feedback is more than helpful. I'll try to address the issues 
>> >> that
>> >> are raised, but still I think that those could extend the initial 
>> >> implementation
>> >> discussed here and do not affect the basic design.
>> >>
>> >>
>> >> To sum up:
>> >>
>> >> 1) pci slot of each device must be saved in backend for the sake of
>> >> migrations.  Should this be a slot in NIC, Disk objects or should it be
>> >> saved as a tupple in serialized data?
>> >>
>>
>> Again, are you sure it can't just be taken from kvm on the source node
>> before migration, instead of saving it?
>>
>
> No you cannot. See above. It must be saved somewhere..
>
>> >> 2) if we want disk hotplug, the format of runtime files should change. 
>> >> block_devices
>> >> should be treated separately, just like NICs. Migrating old instances 
>> >> without first
>> >> rebooting them still remains an issue.
>> >>
>>
>> Still not entirely sure here. But let's discuss the above point first.
>>
>
> We definitely have to split disks from the cmd part. See above.
>
>> >> 3) merge all rpc calls in one hotplug_device(device, action)
>>
>> Ack.
>>
>> >>
>> >> 4) do not include disk resizing in first implementation.
>> >>
>>
>> Ack.
>>
>> >> 5) add hv param for enabling hotplug or not
>>
>> Ack.
>>
>> >> 6) warnings in case of disk/nic live removal
>> >>
>>
>> Ack.
>>
>> >> 7) warnings in case hotplug is not supported but --hotplug option is 
>> >> passed
>> >>
>>
>> Ack.
>>
>
> OK with the ACKs
>
>> >> 8) address Sasha's concerns:
>> >>    * Parse kvm help output to see if hotplug is supported.
>> >>    * Dig into the case where an instance does not run as a root.
>> >>
>>
>> Definitely.
>>
>> >>
>> > 9) address the uuid(or something)-to-kvm_device_name deterministic 
>> > function.
>> >
>>
>> Yes, this one worries me a bit. I would be OK with "start-of-uuid" as
>> long as the start of uuid is guaranteed (elsewhere) to be unique.
>>
>
> The new design (see above) address it adequately. The deterministic function
> can be <device type>-pci-<device slot>.
>
>
> Looking forward to your comments.
>

Thanks a lot,

Guido

Reply via email to