Hi all,

We've been using Young Oh's OpenStack module and we've run into an
interesting bug/behaviour. Below I use virtual machine to refer to the
entry for the computer in VCL and instance for the actual virtual machine
running in OpenStack.

When the module terminates an OpenStack instance it does by looking up the
instance ID by searching for its private IP address. The IP address it uses
is determined using the get_computer_private_ip_address function in VCL
which first looks in the data structure, then the hosts file and finally
the database.

If I understand correctly since the IP address is part of the reservation
part of the data structure it's only going to be available when there's an
active reservation (could someone confirm?). Up to this point we haven't
been populating the hosts file ourselves because the OpenStack module takes
care of that itself, but what this means is that the IP address won't be
present in the hosts file until that virtual machine has been reserved for
the first time. Finally we've just been putting bogus values in the
database for the IP address since it will change every time a new instance
is created.

The problem is the database is (obviously) inaccurate and the hosts file
can potentially become inaccurate which I believe causes the following
problematic situations:

1. A virtual machine is reserved for the first time which causes the
OpenStack module to use the IP address in the database since it's not
present anywhere else. This IP is almost guaranteed to be incorrect and
will cause any instance that may happen to be using it to be terminated.
This is what caused us problems but what we should have done instead was
leave the fields blank in the database.

2. Since the OpenStack module only updates the host file when load is
called it's possible for the instance to be terminated which releases its
IP for use by a new instance. If that happens the new instance would be
terminated if a new reservation was made for the virtual machine
corresponding to the old instance. From what I can tell this shouldn't
occur during normal operation, but could still conceivably happen.

I'm pretty sure I've got these details right, but please correct any gaps
in my understanding. Being affected by the first problem was a mistake on
our part but this is still something that could probably be handled in a
more consistent way. The best solution would probably be to modify the
OpenStack module to use the instance's UUID rather than private IP address
as the primary identification method which would remove any potential
possibility of collisions. Does this sound right to everyone?

Cameron Mann

Reply via email to